Research Article | Volume: 9, Issue: 2, March, 2021

Plasticity of tandem repeats in expressed sequence tags of angiospermic and non-angiospermic species: Insight into cladistic, phenetic, and elementary explorations

Shamshad Ul Haq Prerna Dhingra Meenakshi Sharma S. L. Kothari Sumita Kachhwaha   

Open Access   

Published:  Mar 10, 2021

DOI: 10.7324/JABB.2021.9204
Abstract

Angiospermic and non-angiospermic groups comprise plant species representing short and long range of discrepancies in their morphological, physiological, biochemical, molecular, and developmental processes. Analysis at molecular level plays crucial role to ascertain the heterogeneity within and across the species. The tandem repetitive DNA elements are one of the most important elements which play a significant role in various genetic and genomic applications. Therefore, the plasticity of tandem repetitive DNA element especially simple sequence repeats (SSRs) was analyzed in the expressed sequenced tags (ESTs) of both angiospermic and non-angiospermic species comprising 75 plant species belonging to different evolutionary clades such as algae, fungi, bryophytes, pteridophytes, gymnosperms, dicots, and monocots. Significantly, angiospermic and non-angiospermic species represented distinctiveness at GC content, SSR incidence and SSR motif distributions in their EST sequences. Notably, non-angiosperms revealed more GC-content compared to angiosperms but angiosperms depicted enhanced tandem repetitions (EST-SSRs) compared to non-angiosperms. Among different types of SSRs, mononucleotide SSRs represented widespread distribution followed by trinucleotide SSRs distribution in both angiosperms and non-angiosperms. In general, SSR motifs such as A/T, AG/CT, AAG/CTT, and CCG/CGG were found to be more repeated but highly complex motifs patterns were observed within hexa, penta, and tetranucleotide SSRs, respectively. Thus, a quantity of nexus and diversification were observed within and across the species as well as evolutionary clades. To infer, differential patterns of DNA tandem identified within ESTs can unfold the genetic polymorphism, diversification, conservation, and genome evolution within and across species.


Keyword:     Angiospermic species Non-angiospermic species Expressed sequence tags Tandem repeats Simple sequence repeats Evolutionary clades or phylogenetic clades.


Citation:

Ul Haq S, Dhingra P, Sharma M, Kothari SL, Kachhwaha S. Plasticity of tandem repeats in expressed sequence tags of angiospermic and nonangiospermic species: Insight into cladistic, phenetic and elementary explorations. J App Biol Biotech. 2021;9(2):36-59.
doi: dx.doi.org/10.7324/JABB.2021.9204

Copyright: Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike license.

HTML Full Text
1. INTRODUCTION

Angiospermic and non-angiospermic groups encompass an enormous diversity of plant species which represent homogenous as well as heterogeneous relationships at morphological, physiological, biochemical, and molecular levels. These kinds of relationships among species, allows to strengthen the adaptability, flexibility, and survivability of species or populations against different ecological conditions or environmental fluctuations. Last few decades, a swift in genetic and cytogenetic explorations were observed which provided thorough details of genome organization, genetic diversity, and genome evolution through the analysis of nuclear DNA, organelle DNA, expressed sequence tag (EST) and chromosomal aberration, etc. While, repetitive DNA elements-based studies were found to be more in practice due to their major portion in nuclear genome as well as expressed region of genome (EST) among eukaryotic organisms. Repetitive DNA elements are present in the form of tandem repeats (microsatellite or simple sequence repeat, minisatellite, etc.) and interspersed repeats (transposons, retrotransposons, etc.). These tandems can repeat massive times and might be responsible for structural and functional participations in the genome. In several studies, DNA element is observed to be very important for its involvement in genome size, genetic diversity, genome organization, conservation, and evolution within and across the species and taxa [1-3].

Especially, expressed sequence tags (EST) are the most important genomic resources owing to their functional role in the genome and can serve as a connection between genomics and molecular ecology [4]. Last few decades, ESTs have gained momentum in extensive and rapid applications for gene discovery, gene annotation, genetic polymorphism, transcriptomics profiling, and proteomic exploration [5,6]. ESTs are randomly selected, unedited, and single pass sequencing of clones from cDNA libraries, ranging from 200 to 800 nucleotide bases. These sequences have gained advantages over whole genome sequencing because of their direct association in the gene function. Besides this, it is a rapid approach, less expensive, easy handling, and consuming less time [7]. Astonishing involvement of ESTs has been confirmed in identification of miRNA precursors and targets [8-10], transcriptome analysis using cDNA microarrays [11-13], and gene discovery and gene expression analysis [8,14-17].

Moreover, EST sequences are also very important resource for tandem repetitive DNA elements especially simple sequence repeats (SSRs) which serve as molecular markers and are very useful for variety of genetic or genomic applications. Microsatellites or SSRs are tandemly repeated DNA sequences generally ranging from 1 to 6 nucleotides long which are dispersed randomly and ubiquitously throughout the genomes in both prokaryotic and eukaryotic organisms [18-20]. They are frequently present in both coding and non-coding regions of genome [21]. Thus, EST-SSRs based studies are found to be more implemented in various plant genetic applications, namely, genetic diversity, ecological, evolutionary, phylogeny, taxonomical, and comparative genomic studies [22,23]. All these genetic applications became possible due to the multi-allelic nature, co-dominancy, and high reproducibility of microsatellite (SSRs) [24]. SSRs markers also allow the identification of prototype of gene content, generation of genetic relatedness, and frequency of genetic drift which are very crucial factors in the population for recognizing the conservation units [25]. In addition, the use of publicly available EST libraries has shown an alternative way for EST-SSRs resource which has proved to be a powerful and promising tool for variety of applications, namely, population genetics, biodiversity, genetic drift, high resolution genetic maps, gene mapping, QTL (quantitative trait locus), germplasm characterization, cultivar identification, paternity analyses, and marker assisted breeding [8,26-32].

The present study provides the information about the distribution dynamic of DNA tandem repeats in the ESTs of angiospermic and non-angiospermic plant species. For the analysis, a total of 75 species were selected under different phylogenetic lineage such as, algae, fungi, bryophytes, pteridophytes, gymnosperms dicots, and monocots. Furthermore, ESTs of selected species were used for the analysis of SSRs distribution within and across different species and imperative of EST-SSRs were discussed according to their origin, distribution, conservation, and evolution.


2. MATERIALS AND METHODS

2.1. Plant Materials

The 75 different plant species belonging to six distinct evolutionary clades were used for the tandem repetitive DNA elements (EST-SSRs) analysis. Out of 75 species, 30 species were non-angiosperms which included 10 species of algae, 10 species of fungi, 3 species of bryophytes, 2 species of pteridophytes, and 5 species of gymnosperms. Among angiosperms, 34 species were dicots and 11 species were monocots, as shown in Table 1.

Table 1: Details of non-angiospermic species and angiospermic species used for tandem repeat analysis.

Non-angiospermic species

AlgaeFungiBryophytesPteridophytesGymnosperms
Chaetosphaeridium globosumAlbugo candidaMarchantia polymorphaAdiantum capillus-venerisGinkgo biloba
Chlamydomonas reinhardtiiAspergillus nigerPhyscomitrella patensSelaginella moellendorffiiGnetum gnemon
Chlorella variabilisCercospora zeae-maydisSyntrichia ruralisCycas rumphii
Chlorokybus atmophyticusFusarium graminearumPinus pinaster
Ectocarpus siliculosusMucor circinelloidesWelwitschia mirabilis
Klebsormidium flaccidumNeurospora crassa
Mesotigma viridePhytophthora infestans
Nitella hyalinaPuccinia triticina
Porphyra yezoensisSaccharomyces cerevisiae
Volvox carteriUstilago maydis
Angiospermic species

DicotsMonocots
Cantharanthus roseusEuphorbia esulaPisum sativumAvena barbata
Ocimum basilicumHevea brasiliensisFragaria vescaAvena sativa
Capsicum annuumManihot esculentaMalus domesticaCenchrus ciliaris
Nicotiana tabacumRicinus communisPrunus persicaHordium vulgare
Solanum lycopersicumArachis hypogaeaVitis viniferaOryza sativa
Daucus carotaCajanus cajanArabidopsis thalianaSecale cereale
Panax ginsengCicer arietinumBrassica napusSorghum bicolor
Artemisia annuaGlycine maxRaphanus sativusSorghum propinquum
Helianthus annuusLotus japonicusCarica papayaTriticum aestivum
Citrullus lanatusMedicago truncatulaGossypium hirsutumZea mays
Cucumis meloTrifolium pratenseTheobroma cacaoMusa acuminata

Liriodendron tulipifera

2.2. Expressed Sequence Tags Sequences Retrieval

A total of 43,52,515 partial EST transcripts were examined from National Center for Biotechnology Institute (NCBI), a public database which provides easy accessibility and user-friendly platform for the analysis. The batch files of EST sequences were retrieved as FASTA format for the selected plant species and range was fixed between the limit: 10 thousand to 100 thousand sequences, according to the availability of sequence information for the selected species at NCBI as well as system competency.

2.3. EST Sequences Assembling and Computational Analysis

For the analysis, all the retrieved EST sequences were subjected to sequence assembling program for minimization of sequences redundancy through CAP3 platform using default parameters. The CAP3 assembly program has a capability to clip 5′ and 3′ low-quality regions of reads. As well, it uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences [33]. Furthermore, some basic computational analyses were performed for all the assembled EST sequences using Perl script from the internet bioinformatics resources.

2.4. Simple Sequence Repeats (SSRs) or Microsatellites Screening

To study the distribution dynamics of SSRs, all the assembled EST sequences of 75 species were subjected to MIcroSAtellite identification tool (MISA) (http://pgrc.ipk-gatersleben.de/misa/). It is Perl command line exercise for identifications and characterizations of different types of SSRs. It produces separate output text files with the following information such as sequence name, number of SSRs, type of SSR, types of SSR motif, SSR position, repeat length, and repeat number. Moreover, only mono to hexa nucleotide SSRs were considered and limitation for SSRs detection were 10, 6, 5, 5, and 5 repeat units for mono, di, tri, tetra, penta, and hexa nucleotides repeats, respectively.


3. RESULTS AND DISCUSSION

3.1. EST Sequences Characterization

The comparative analysis of EST-SSRs was performed among 75 different plant species belonging to diverse phylogenetic lineage such as algae, fungi, bryophytes, pteridophytes, gymnosperms, dicots, and monocots. A total of 4352515 (4.35 millions) EST transcripts were examined and 1306939 non-redundant ESTs (NR-ESTs) sequences were obtained after assembling [Figures 1 and 2]. A set of 528211 contigs were obtained with higher N50 value compare to N25 and N75 and N50 value was ranged from 500 bp to 1200 bp with an average of 900bp. Similarly, a total of 778728 singlets were obtained and sequence lengths ranged from 500bp to 1600bp with an average of 800bp in size. The overall average length of NR-ESTs sequence was 717.69 bp long ranging from 513.56 bp to 1033.83 bp long which is quite comparable with previous studies in the different plant species [34,35]. It was observed that there were deviations in the number of reads among contigs and singlets. This variation may be explained by related or distal part of the sequencing and inadequacy of the sequencing data of the species and used parameter in the assembling pipeline. Regarding to mean values of sequence length, non-angiosperms showed high average sequence length as compared to angiosperms. Among phylogenetic clade, bryophytes and pteridophytes revealed high average sequence length coverage and lowest was observed in gymnosperms [Figure 3]. Among species, high average sequence length was reported in Albugo candida (1033.83bp) followed by Selaginella moellendorffii (991.30bp), and Chlorokybus atmophyticus (953.14bp). Similarly, lowest average length was seen as 513.56 bp and 524.43 bp in Lotus japonicas and Theobroma cacao, respectively [Additional file 1].

Figure 1: Comparative details of EST characterizations among 30 non-angiospermic species.



[Click here to view]

Figure 2: Comparative details of EST characterizations among 45 angiospermic species.



[Click here to view]

Figure 3: Average sequence length (nucleotides) distribution in non-redundant EST sequences among different evolutionary clades.



[Click here to view]

3.2 Distribution of GC-content in ESTs

Comparative distribution of GC-content was examined in NR-ESTs belonging to 75 different species. In general, the average GC-content was 46.61%, ranging from 38.61% to 65.16% which is in wake of earlier observations within various plant species [36,37]. Significantly, higher GC-content was found commonly in non-angiosperms compared to angiosperms. Within evolutionary clades, algae showed relatively increased GC-content followed by fungi, bryophytes, pteridophytes, and gymnosperms, respectively [Figure 4]. Among non-angiospermic species, significantly increased GC value was observed in algae, Chlorella variabilis and Klebsormidium flaccidum; in fungi, Ustilago maydis and Cercospora zeae-maydis; in bryophyte, Syntrichia ruralis; in pteridophyte, Selaginella moellendorffii, and in gymnosperm, Gnetum gnemon [Additional file 2]. Among angiosperms, an increased GC-content was identified in monocots compared to dicots which are in agreement with previous study [38]. While, within dicot species, rosid species showed relatively enhanced GC-content related to asteroid plant species but no skewness was observed within asteroid and rosid species. For angiospermic species, the rise of GC value was seen in dicot species namely; Brassica napus, Fragaria vesca, and Ocimum basilicum while Zea mays and Sorghum propinquum represented high GC value in monocots [Additional file 3]. Notably, GC-content is considered as very important parameter reflecting the information about gene structure (intron size and number), thermostability, gene regulation, and evolution [39,40]. While, more GC-content is indicative of high gene density and their compactness [41,42], display earlier replication timing [43], influences rates of recombination [44], and determining of physical and physiological properties of DNA [45].

Figure 4: Average GC-content distributions in non-redundant EST sequences among different evolutionary clades.



[Click here to view]

3.3. Frequency Distribution of SSRs in ESTs

The circulation of SSRs was examined among ESTs of selected species and mainly mono to hexanucleotide SSRs were considered. A total of 678260 SSRs were identified and an average frequency distribution was 9.65%, ranged from 1% to 24.81% excluding mononucleotide SSRs. The range of SSR distribution in the present study is found to be exhibit similarity with previous studies reported in various plant species [27,46-50]. Twisting in SSR frequency can be explained by various factors used such as, types of SSR mining tool, parameters used for mining, and wealth of sequences which may develop significant differences in the SSRs frequency distributions. Comparatively increased SSR incidence was observed in angiosperms with 10.50% frequency distribution in comparison to non-angiosperms with 8.42% frequency distribution [Figure 5]. Among non-angiosperms, increased SSRs distribution was identified in pteridophytes and algae while lowest was seen in gymnosperms. For angiosperms, monocots showed more SSRs incidence than dicots and this increased SSR incidence can be explained by highly dynamic nature of angiosperm genomes, large genome size, and their structure [51] as well as rise of polyploidy in higher plants may also be responsible for changing of SSR incidences. It appears that SSR incidence was inversely proportional to GC-content as angiosperms revealed a reduced GC-content (44.21%) with high SSRs occurrence and non-angiosperms showed high GC-content (50.22%) with low SSRs occurrence. Therefore, the nature of divergence in the SSR incidence, SSR length, motif structure, and GC-content are very important influencing factors for conservation and evolutionary action [52].

Figure 5: Comparative details of EST-SSRs frequency (%) among different phylogenetic clades.



[Click here to view]

Moreover, the randomness in the average value with extremely reduced SSR frequency was observed in alga, Klebsormidium flaccidum (1.37%); fungi, Albugo candida (1.0%), Phytophthora infestans (1.57%); and gymnosperms, Pinus pinaster (2.76%) while extremely increased SSR frequency was observed in Volvox carteri (20.77%), Chlorokybus atmophyticus (20.02%), and Chlorella variabilis (17.24%) among non-angiospermic species [Figure 6]. Among angiosperms, Pisum sativum (3.48%), Cajanus cajan (3.55%), and Daucus carota (4.24%) showed decreased SSR frequency distribution from average while, Oryza sativa, Trifolium pratense, Ricinus communis, Cucumis melo, and Raphanus sativus significantly deviated from the average value with an extremely increased SSR frequency of 24.81, 20.20, 19.19, 17.90, and 17.23, respectively [Figure 7]. Our observation of ascended SSR frequency is in accordance with the earlier reports of comparative genomic analysis by various workers [46,48,49,53-56].

Figure 6: Percentage of SSR incidence within the species belonging to non-angiosperm.



[Click here to view]

Figure 7: Percent of SSR incidence within the species belonging to angiosperm.



[Click here to view]

3.4. Frequency Distribution of Different Type of SSRs in ESTs

A comparison in the distribution of different types of SSRs was analyzed within ESTs of selected species belonging to different clades. Overall, the occurrence of mononucleotide repeats was found to be with 80.95% frequency distribution while 19.05% frequency distribution belonged to other types of SSRs (di to hexa nucleotide SSRs). Mononucleotide SSRs were observed to be highly repetitive with uniform distribution and few fluctuations. It has been seen that mononucleotide SSRs might be responsible to play a vital role in filling the gaps in linkage maps and their applications have been successfully established in some populations [47]. Among mononucleotide SSRs distribution, the non-angiosperms (70.67%) showed increased incidence compared to angiosperms (67.67%). Usually among non-angiosperms, algae (53.91%), bryophytes (67.84%), and pteridophytes (69.48%) displayed reduced mononucleotide SSRs incidence while, gymnosperms (84.47%) and fungi (77.63%) showed significantly increased mononucleotide SSRs incidence, respectively. Similarly, for angiosperms, the increased mononucleotide SSRs incidence was observed in dicots (70.96%) as compared to monocots (64.37%).

Excluding mononucleotide SSRs, trinucleotide SSRs were found to be in major (51.28%) repetition, followed by dinucleotide SSRs (39.32%), hexa nucleotide SSRs (3.43%), tetra nucleotide SSRs (3.01%), and penta nucleotide SSRs (2.96%) in general analysis [Figure 8]. The increased trinucleotide SSRs incidence is in agreement with previous genomics studies done in various species [57-59] and relatively high accountability of our tri and hexa nucleotide repeats is also in accordance with previous reports [20,60]. Increased frequency of trinucleotide SSRs has also been reported in coding and noncoding genome of viruses, organelles, plasmids, prokaryotes, fungi, protists, and humans [61,62]. High recurrence of tri and hexa nucleotide SSRs has also been observed more than other types of SSRs in genomic and EST sequences [63,64].

Figure 8: Comparative distribution of different types of SSRs comprising Di to Hexa nucleotide repeats among different phylogenetic clades.



[Click here to view]

Significantly, the common pattern was observed for different types of SSR in both non-angiosperms and angiosperms but some fluctuations in tetra, penta, and hexa nucleotide SSRs were observed from the general trend among different evolutionary clades [Figure 8]. Significantly, distinguish species revealed a deviation from the average value of SSRs such as, Adiantum capillus-veneris (82.00%), Daucus carota (66.44%), and Liriodendron tulipifera (65.74%) showed deviation in dinucleotide SSR, Chlorella variabilis (91.45%), Chlorokybus atmophyticus (83.49%), and Porphyra yezoensis (79.17%) showed in tri nucleotide SSR, Mesostigma viride (38.92%) in tetra nucleotide SSR, Mesostigma viride (23.78%) in penta nucleotide SSR, and Fusarium graminearum (10.90%) in hexa nucleotide SSR [Additional file 4]. Earlier observations gave similar view of uneven distribution of average frequency among distinct plant species [48,55,56,58,65-67].

Moreover, mono, tri, and dinucleotide SSRs have shown increased distribution in comparison to hexa, tetra, and penta nucleotide SSRs, respectively. However, the existence of different types of SSRs and their complete molecular mechanism, distribution and dominant behavior of SSRs are unstated but it may have possibly risen from selection pressures applied on that specific motif during evolution in the plant genome. While, the replication slippage mechanism is also very important factor that affects a process involving addition or removal of one or more motif repeats and nucleotide substitutions, or duplication events, besides that unequal crossing over have been also seen to influence microsatellite variations [68-70].

3.5. SSRs Motif Length and Categorization

The motif length in different types of SSRs was examined in the ESTs of selected plant species. In total, the average SSRs motif length was found to be 21.12 bp long which is slightly deviated from the earlier reports [67,71]. In general, hexa nucleotide motif (26.60bp) showed high average motif length, followed by tetra nucleotide (22.30 bp), penta nucleotide (22.26 bp), dinucleotide (19.12 bp), mononucleotide (18.51 bp), and trinucleotide motif (17.94 bp). This trend of motif length was found to be common in both non-angiosperms and angiosperm but few deviations were seen among non-angiospermic clades. The motif length strengthening or shortening within particular types of SSRs have an influential role on biological complexity which can be correlated with genetic evolution and regulation of evolutionary mechanism while their existence in protein-coding regions can be involved in gain or loss of gene function [69,72-74]. The uniformity in the basic style of SSRs motif length was observed in angiosperms compared to non-angiosperms which represented some divergence [Figure 9]. However, some skewness in motif length was also observed among different evolutionary groups and species [Additional file 5].

Figure 9: Comparative details of SSR motif length distributions among evolutionary clades.



[Click here to view]

On the basis of motif length, microsatellites or SSRs can be categorized into class I and class II perfect microsatellites. A total of 26.43% SSRs were recognized as class I (≥20bp) type perfect microsatellites and rest (73.56%) were belong to class II (12-20bp) type perfect microsatellites, excluding both mono and compound SSRs which is in compliance with earlier report [75]. The class II type of microsatellites was found to be widespread than class I which is in consensus with previous observations [76,77]. Microsatellites which acquire the length between 20 nucleotides or 12 and 19 nucleotides are reported to be highly mutable [74,78]. The class II type of microsatellites was observed to be more prevalent in angiosperms as compared to non-angiosperms. While, class II type SSRs revealed more regularity in monocots, dicots, and fungi but class I SSRs were widespread in pteridophytes, gymnosperms, and bryophytes [Figure 10]. Consequently, class II type SSR was found to be more frequent than class I types SSR within selected species whether belonging to any evolutionary clade [Additional file 6 and 7].

Figure 10: Comparative distribution of Class I and Class II perfect microsatellites among distinct phylogenetic clades.



[Click here to view]

3.6. Annotation of Most Frequent SSRs Motifs

The enormous diversity in the SSR motifs was obtained within mono to hexa nucleotide SSRs. For example, two motifs (A/T and G/C) with complementarity were identified in mononucleotide SSR followed by four motifs (AC/GT, AG/CT, AT/AT, and CG/CG) in dinucleotide SSR and ten motifs (AAC/GTT, AAG/CTT, AAT/ATT, ACC/GGT, ACG/CGT, ACT/AGT, AGC/CTG, AGG/CCT, ATC/ATG, and CCG/CGG) in trinucleotide SSR. While, the complexed or unfashionable motifs pattern were found onward from tetra to hexa nucleotide SSRs and this nature might be explained by more combinations and permutations of four bases of nucleotides within the motifs. For mononucleotide SSRs, motif A/T was found to be dominant over G/C motif and this rise of A/T motif pattern was almost widespread within and across all species. In general, non-angiosperms represented more A/T motif circulation than angiosperms. Among evolutionary clades, highest A/T incidence was observed in gymnosperms (94.12%) followed by dicots (88.82%) but relatively lower occurrence was seen in algae (68.79%) and monocots (70.17%) with high G/C motif incidence inversely [Figure 11]. The presence of mononucleotide repeats along with their base composition (A/T and G/C) is known to have vital impact on stability of gene and gene functions due to their highly capricious nature which might be responsible for the frameshift mutation in the coding region [79]. The distribution of mononucleotide motifs was noted to be irregular within number of species, for instance A/T motif was found to be more frequent in Triticum aestivum (99.88%), followed by Saccharomyces cerevisiae (99.82%), Pisum sativum (99.76%), and Raphanus sativus (99.56%). Similarly, the G/C motif found to be more circulated with 48.51%, 44.43%, 43.52%, 37.18%, and 37.18% in Ectocarpus siliculosus, Volvox carteri, Porphyra yezoensis, Ustilago maydis, and Oryza sativa, respectively [Additional file 8].

Figure 11: Comparative analysis of different SSR motif distributions amongst mono, di and tri nucleotide repeat motifs amongst phylogenetic clades. Motifs, A/T, AG/CT, AC/GT, AAG CCG/CGG, and AGC/CTG were showed more repetitions.



[Click here to view]

Furthermore, the skewness was observed in the frequency distribution of dinucleotide SSR motifs among species. Commonly, motif AG/CT was identified in major circulations (56.03%) followed by AC/GT (21.22%) and AT/AT (19.41%) but motif CG/CG was in least repetition (3.33%). These patterns of motifs distribution were uniform in phylogenetic clades except algae in which, motif AC/GT was frequent over AG/CT motif and motif CG/CG was dominant over AT/AT motif but motif AT/AT was dominant over AC/GT in gymnosperm. The most frequent AG/CT motif in present study is in compliance with earlier reports followed by either AC/GT or AT/AT and least reported was CG/CG motif in various comparative genomic analysis [52,70]. Accordingly, the abundance of homopurine-homopyrimidine stretches may be explained due to their more commonness in transcribe region and their useful role in the DNA structures modification, regulation of gene expression, and methylation of CpG [69]. Remarkable divergence was seen to emerge from the average value of dimer motifs. For example, motifs AC/GT (45.68%) and CG/CG (15.09%) were found to be common in algae followed by motif AG/CT (65.15%) in bryophytes then motifs AG/CT (63.17%) and AT/AT (5.83%) in pteridophytes while motif AT/AT (36.82%) was in gymnosperms. Similarly, motifs CG/CG (5.81%) and AG/CT (65.66%) were identified to be more reiterated in monocot and dicots, respectively [Figure 11]. Some extreme deviation in the frequency of dimer motifs was also seen in some species, namely, motif AC/GT was frequent in Volvox carteri (83.62%) and Chlamydomonas reinhardtii (72.97%), followed by AG/CT motif found to be widespread in Marchantia polymorpha (89.62%), Fragaria vesca (85.85%), and Malus domestica (82.28%). Further, motif AT/AT was common in Saccharomyces cerevisiae (87.03%) and Albugo candida (72.06%) then motif CG/CG was also frequent in Klebsormidium flaccidum (59.79%), Porphyra yezoensis (30.61%), and Mesostigma viride (21.43%) [Additional file 8].

For trinucleotide SSR motifs, ten distinct motifs were identified in the ESTs of selected plant species. Overall, motif AAG/CTT found to be most dominant, followed by AGC/CTG, CCG/CGG, AGG/CCT, ATC/ATG, ACC/GGT, AAC/GTT, AAT/ATT, ACG/CGT, and ACT/AGT, respectively. Motif AAG/CTT appeared to be widespread among non-angiosperms and angiosperms. Among the non-angiosperm clades, some trinucleotide SSRs motifs showed more repetition such as, motif AGC/CTG was consistently more common within fungi, bryophytes, pteridophytes, and gymnosperms with frequency distribution of 23.91%, 38.51%, 39.69%, and 31.63%, respectively. Motif AAC/GTT (14.09%) was also common in fungi and motifs AAG/CTT (22.04%) and CCG/CGG (12.39%) were common in gymnosperms. Moreover, few motifs seemed to be common in different evolutionary clades such as, motif CCG/CGG in algae, AAC/GTT in fungi, AGG/CCT in bryophytes, ACC/GGT in pteridophytes, and ATC/ATG in gymnosperms [Figure 11]. The commonness of tri nucleotide motifs in the present study is in the wake of accordance with earlier studies [20,46,49,66].

Among dicots, the trinucleotide motifs such as AAG/CTT, ATC/ATG, and ACC/GGT were identified in more repetition but ACG/CGT, ACT/AGT, and CCG/CGG motifs were seen in least circulation. Analysis revealed that motif AAG/CTT was found to be most dominant in Cucumis melo and Citrullus lanatus with 61.71% and 40.85%, respectively. This motif also revealed more repetition in few species such as , Carica papaya, Arabidopsis thaliana, Nicotiana tabacum, Euphorbia esula, and Arachis hypogaea and this repetition is in accordance with various earlier studies [57,80-84]. Individually, few motifs also seemed to be highly duplicated among various species such as, motif ATC/ATG was commonly rich in Daucus carota, Artemisia annua, and Gossypium hirsutum followed by motif ACC/GGT which appeared to be widespread in Trifolium pratense, Helianthus annuus, and Lotus japonicas. While motif AAC/GTT was highly repeated in Pisum sativum, Artemisia annua, and Capsicum annuum, motif AAT/ATT was common in Cajanus cajan, Cicer arietinum, and Hevea brasiliensis [Additional file 9]. All of these common tri nucleotide motifs which appeared in the present study have been reported in various dicot plant species [47,85-89].

Among monocots, the trinucleotide motif like CCG/CGG was more prevalent and this motif incidence was uniformly followed by AGG/CCT, AGC/CTG, ACG/CGT, and AAG/CTT motifs, respectively [Figure 11]. Significantly, motif CCG/CGG found to be widespread among species of Poaceae family wherein, Cenchrus ciliaris, Oryza sativa, Zea mays, and Sorghum propinquum showed highly repeated nature of this motif except Musa acuminate. The predominance of CCG/CGG motif in the present study is in agreement with previous observations in various plant species [26,27,54,58,90]. In the present study, increased repetition of CCG/CGG motif was observed as unique feature for algae and monocots species and this rise of CCG/CGG distribution could be related to increase of GC-content [18,48, 91]. Further, motif AGC/CTG and AGG/CCT were also evenly distributed in grass family except Oryza sativa and Zea mays. The dominancy of different motifs was also detected over average value in certain species, namely, motif AAG/CTT was widespread in Musa acuminate and Avena sativa then motif ACG/CGT was common in Sorghum bicolor and Secale cereale. Some motifs, AGC/CTG and AGG/CCT were found to be frequent in monocot species such as Avena barbata, Avena sativa, Hordeum vulgare, and Triticum aestivum [Additional file 9]. Distinctive more repeated type of trimer motifs were also observed in the present study which are in resemblance with earlier studies reported in some monocot species [27,46,48,49,53,54,91-93].

At present, the asymmetrical incidence of trinucleotide motifs was observed in monocots and dicots and their distribution was found to be almost inversely proportional to the each other. For example, motif CCG/CGG revealed dominancy in monocots compared to dicots whereas in dicot, motifs AAG/CTT seemed to be highly repeated than monocots. However, the common motif AGC/CTG found to be least distribution in both monocots and dicots. In addition, some motifs namely; CCT/AGG, CCG/GGC, GGA/TTC, and GAA/TTC were also identified which are responsible for making unusual DNA folding structures including hairpin form, bipartite triplex form, and simple loop folding. These motifs may also be responsible for having an impact on gene expression and their regulation mechanism. Moreover, the presence of trinucleotide repeats in the coding region encodes distinct type of amino acid tracts within the peptide or protein which might play an important role in various metabolic activities [48-50,94].

In addition, trinucleotide SSRs motifs are known to have influential role at proteome level because they have direct relation with exons level and can generate amino acids stretch in protein. Therefore, various types of predicted amino acids are identified in the first frame translation for different types of tri nucleotide SSRs motifs. In general, serine (Ser), arginine (Arg), leucine (Leu), alanine (Ala), and proline (Pro) amino acids appeared in huge account in the present analysis. For non-angiosperms, Ala found to be more frequent followed by Ser, Gln, and Leu, whereas, Arg, Ser, Ala, and Leu showed more distribution in angiosperms [Figure 12]. Among non-angiospermic clades, frequent distribution of few amino acids was observed such as, Ala was seen commonly in algae and pteridophytes with frequency 19.08% and 15.34% respectively, followed by Leu (11.73%) in fungi then Ser was more widespread in both bryophytes (14.83%) and gymnosperms (13.95%). Among angiosperms, increased level of Ala and Arg was identified in monocots whereas Ser and Leu were commonly identified in dicots [Figure 13]. This finding is in accordance with earlier genomic studies reported in different species [20,46,49,66]. It is obvious that long stretch of amino acid is responsible for increasing protein size which can create a transition in protein activity. Certain types of single amino acid repetitions have potential to regulate transcriptional activities and contribute in protein-protein interactions. These kinds of amino acids distribution at protein level are involved in the various molecular activities such as ubiquitin activity, structural activity, and receptor activity. While, single amino acid stretch may also provide assistance as spacer elements and also help in distinguishing protein domains [95]. Furthermore, numbers of amino acids were observed majorly within different species, namely, Ala was found to be frequent in Chlorella variabilis, Ectocarpus siliculosus, Chlorokybus atmophyticus, Neurospora crassa, Marchantia polymorpha, and Selaginella moellendorffii. Then, Ser frequently was identified in Gnetum gnemon and Arachis hypogaea and Arg was familiar in Oryza sativa. It was also observed that some amino acids were in moderate amount but amino acids also such as methionine (Met), tryptophan (Trp), and tyrosine (Tyr) were shown their repetitions in very diminutive amount. The stop codons such as Amber (Am*), Ochre (Oc*), and Opal (Op*) were also detected but among them, Op* was more frequently distributed than Oc and Am. Moreover, dicots, monocots, and algal species showed high frequency of Op* codon in comparison to Am* and Oc*. While, high frequency of the Op stop codon was also seen in Nitella hyaline, Brassica napus, and Raphanus sativus with 7.44%, 3.08%, and 2.83% distribution separately [Additional file 10].

Figure 12: Relative amino acids distribution between non-angiosperms and angiosperms. In general, amino acids, namely, alanine (Ala), arginine (Arg), leucine (Leu), serine (Ser), and proline (Pro) were found to be widespread.



[Click here to view]

Figure 13: Comparative distribution of predicted amino acids encoded by trinucleotide repeat motifs amongst different evolutionary clades..



[Click here to view]

Due to combination and permutation of nucleotides in SSRs motif, an immense diversity was observed in the SSR motifs belonging to tetra, penta, and hexa nucleotide SSRs with lack of relation which was identified in the frequency of motifs and type of motifs within and across species. Therefore, the complexed incidence of different types of motifs was observed in the present study and their distributions were immense. For tetranucleotide SSRs, few numbers of specific SSR motifs were observed comparatively within species, namely, motifs AATC/ATTG, ACAT/ATGT, and AATT/AATT were more duplicated in Nitella hyaline, Volvox carteri, and Mesostigma viride, respectively. Further, motif AGGC/CCTG was found to be highly repeated in Neurospora crassa, followed by motif AGGC/CCTG in Marchantia polymorpha and motif AGCG/CGCT in Selaginella moellendorffii. In monocots, motif ATCC/ATGG was found to be highly repeated in Oryza sativa while in dicot, motif AAAT/ATTT was widespread in Artemisia annua and Prunus persica. Furthermore, motif AAAG/CTTT was more frequent in Arachis hypogaea, Cucumis melo, Ricinus communis, and Theobroma cacao. The prevalence of these types of tetramer motifs is in concurrence with earlier observations reported in various species [47,50,52,58,65].

Similarly, a complexed trend was identified in pentanucleotide SSRs but few motifs seemed to more common than other within the species such as, motifs AAATT/AATTT, AGCCT/AGGCT, AAAAT/ATTTT, and AGAGG/CCTCT were found to be more frequent in non-angiospermic species especially in Mesostigma viride, Neurospora crassa, and Physcomitrella patens, respectively. In monocot, motifs AGAGG/CCTCT, AAGAG/CTCTT, and AGGGG/CCCCT were common in Oryza sativa followed by motifs AGAGG/CCTCT and AGGGG/CCCCT in Hordeum vulgare and motifs AGCTC/AGCTG and AGAGG/CCTCT were in Zea mays. In dicot species, the reiteration of motif like AAAAG/CTTTT was found to be common in Manihot esculenta, Theobroma cacao, Cucumis melo, and Arachis hypogaea. Motif AAAAT/ATTTT was more common among Artemisia annua, Prunus persica, and Hevea brasiliensis and this observation is in agreement with previous studies among different plant species [65,82]. Significantly, the hexanucleotide SSRs seemed to be more dominant over tetramer and penta nucleotide SSRs which is in compliance with earlier analysis in various plant species [57,96]. Surprisingly, massive diversity was identified in hexanucleotide SSRs motif patterns and limitless array of different types of motifs was seen with diminutive repetition. Besides, few hexa nucleotide motifs showed comparatively enhanced repetitions in distinct plant species, namely, motif ATCGCC/ATGGCG was found to be common in Nitella hyaline and Selaginella moellendorffii followed by motif ACAGAT/ATCTGT in Neurospora crassa. Motifs AGGCGG/CCGCCT, AGCCTG/AGGCTC, and AACCCT/AGGGTT observed in Oryza sativa, Gossypium hirsutum, and Artemisia annua, respectively, are in compliance with previous reports in different species [48,66,97,98].


4. CONCLUSION

The present study aimed to explore the plasticity of tandem repeated DNA elements, especially SSRs analysis in expressed sequence tags (ESTs). In general, mononucleotide to hexa nucleotide SSRs were annotated at large scale ESTs of 75 different species belonging to diverge evolutionary clades such as algae, fungi, bryophytes, pteridophytes, gymnosperms, dicots, and monocots. Approximately, 4.35 million EST sequences were examined for SSRs exploration which resulted in identification of huge diversity in SSRs distributions in ESTs of selected species. Mononucleotide SSRs were identified as utmost in circulation in the ESTs uniformly followed by trinucleotides, dinucleotides, hexanucleotides, tetra nucleotides, and penta nucleotides SSR, respectively. An immense diversity in the SSR frequencies and their motifs distribution were identified within and across the species belonging to angiosperms and non-angiosperms. According to SSR motifs incidence, mononucleotide to trinucleotide SSR motifs showed remarkable distribution in the ESTs and their categorization was found to be explicit. Conversely, more complex pattern of motifs distribution was identified within hexanucleotide SSRs and pentanucleotide SSRs in comparison to tetranucleotide SSRs which showed slightly less diversity in motifs relatively. Therefore, a number of distinctive attributes were revealed which enhanced our understanding about the SSRs variation, distribution, expansion, and divergence within and across angiospermic and non-angiospermic species or different evolutionary clades.


5. CONFLICTS OF INTEREST

The authors declare that there are no conflicts of interest regarding the publication of this paper.


6. ACKNOWLEDGMENT

Authors are thankful to Council of Scientific and Industrial Research (CSIR) for the fellowship (CSIR-RA). Authors are grateful to DBT-Bioinformatics Infrastructure Facility, UGC-UPE and Department of Botany, University of Rajasthan for providing necessary facilities.


7. AUTHOR CONTRIBUTIONS

All authors made substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; took part in drafting the article or revising it critically for important intellectual content; agreed to submit to the current journal; gave final approval of the version to be published; and agree to be accountable for all aspects of the work. All the authors are eligible to be an author as per the international committee of medical journal editors (ICMJE) requirements/guidelines.


8. ETHICAL APPROVALS

This study does not involve experiments on animals or human subjects.


9. PUBLISHER’S NOTE

This journal remains neutral with regard to jurisdictional claims in published institutional affiliation.

REFERENCES

1. Kubis S, Schmidt T, Heslop-Harrison JS. Repetitive DNA elements as a major component of plant genomes. Ann Bot 1998;82:45-55. [CrossRef]

2. Shapiro JA, Von Sternberg R. Why repetitive DNA is essential to genome function. Biol Rev 2005;80:227-50. [CrossRef]

3. Biscotti MA, Olmo E, Heslop-Harrison JP. Repetitive DNA in Eukaryotic Genomes. Berlin, Germany: Springer; 2015. [CrossRef]

4. Bouck A, Vision T. The molecular ecologist's guide to expressed sequence tags. Mol Ecol 2007;16:907-24. [CrossRef]

5. Edwards NJ. Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol Syst Biol 2007;3:102. [CrossRef]

6. Parkinson J, Blaxter M. Expressed sequence tags: An overview. In: Expressed Sequence Tags (ESTs). Berlin, Germany: Springer; 2009. p. 1-12. [CrossRef]

7. Nagaraj SH, Gasser RB, Ranganathan S. A hitchhiker's guide to expressed sequence tag (EST) analysis. Brief Bioinform 2007;8:6-21. [CrossRef]

8. Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S, Claverie JM. Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 1999;9:950-9. [CrossRef]

9. Prabu G, Mandal A. Computational identification of miRNAs and their target genes from expressed sequence tags of tea (Camellia sinensis). Genom Proteom Bioinform 2010;8:113-21. [CrossRef]

10. Zhang Y, Zhu X, Chen X, Song C, Zou Z, Wang Y, et al. Identification and characterization of cold-responsive microRNAs in tea plant (Camellia sinensis) and their targets using high-throughput sequencing and degradome analysis. BMC Plant Biol 2014;14:271. [CrossRef]

11. Alba R, Payton P, Fei Z, McQuinn R, Debbie P, Martin GB, et al. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. The Plant Cell 2005;17:2954-65. [CrossRef]

12. Cui G, Huang L, Tang X, Zhao J. Candidate genes involved in tanshinone biosynthesis in hairy roots of Salvia miltiorrhiza revealed by cDNA microarray. Mol Biol Rep 2011;38:2471-8. [CrossRef]

13. Zhou GF, Liu YZ, Sheng O, Wei QJ, Yang CQ, Peng SA. Transcription profiles of boron-deficiency-responsive genes in citrus rootstock root by suppression subtractive hybridization and cDNA microarray. Front Plant Sci 2015;5:795. [CrossRef]

14. Baxevanis AD, Ouellette BF. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Vol. 43. United States: John Wiley & Sons; 2004.

15. Hampton M, Xu WW, Kram BW, Chambers EM, Ehrnriter JS, Gralewski JH, et al. Identification of differential gene expression in Brassica rapa nectaries through expressed sequence tag analysis. PLoS One 2010;5:e8782. [CrossRef]

16. Sui S, Luo J, Ma J, Zhu Q, Lei X, Li M. Generation and analysis of expressed sequence tags from Chimonanthus praecox (Wintersweet) flowers for discovering stress-responsive and floral development-related genes. Comp Funct Genom 2012;2012:134596. [CrossRef]

17. Sasaki K, Mitsuda N, Nashima K, Kishimoto K, Katayose Y, Kanamori H, et al. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology. BMC Genom 2017;18:683. [CrossRef]

18. Morgante M, Olivieri A. PCR-amplified microsatellites as markers in plant genetics. The Plant J 1993;3:175-82. [CrossRef]

19. Jurka J, Pethiyagoda C. Simple repetitive DNA sequences from primates: Compilation and analysis. J Mol Evol 1995;40:120-6. [CrossRef]

20. Tóth G, Gáspári Z, Jurka J. Microsatellites in different eukaryotic genomes: Survey and analysis. Genome Res 2000;10:967-81. [CrossRef]

21. Ellegren H. Microsatellites: Simple sequences with complex evolution. Nat Rev Genet 2004;5:435. [CrossRef]

22. Agarwal M, Shrivastava N, Padh H. Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Rep 2008;27:617-31. [CrossRef]

23. Masouleh AK, Waters DL, Reinke RF, Henry RJ. A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF mass spectrometry. Plant Biotechnol J 2009;7:355-63. [CrossRef]

24. Oliveira EJ, Pádua JG, Zucchi MI, Vencovsky R, Vieira ML. Origin, evolution and genome distribution of microsatellites. Genet Mol Biol 2006;29:294-307. [CrossRef]

25. Heywood VH, Iriondo JM. Plant conservation: Old problems, new perspectives. Biol Conserv 2003;113:321-35. [CrossRef]

26. Cordeiro GM, Casu R, McIntyre CL, Manners JM, Henry RJ. Microsatellite markers from sugarcane (Saccharum spp.) ESTs cross transferable to erianthus and sorghum. Plant Sci 2001;160:1115-23. [CrossRef]

27. Kantety RV, La Rota M, Matthews DE, Sorrells ME. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol Biol 2002;48:501-10. [CrossRef]

28. Eujayl I, Sledge M, Wang L, May G, Chekhovskiy K, Zwonitzer J, et al. Medicago truncatula EST-SSRs reveal cross-species genetic markers for Medicago spp. Theor Appl Genet 2004;108:414-22. [CrossRef]

29. Varshney RK, Chabane K, Hendre PS, Aggarwal RK, Graner A. Comparative assessment of EST-SSR, EST-SNP and AFLP markers for evaluation of genetic diversity and conservation of genetic resources using wild, cultivated and elite barleys. Plant Sci 2007;173:638-49. [CrossRef]

30. Simko I. Development of EST-SSR markers for the study of population structure in lettuce (Lactuca sativa L.). J Hered 2009;100:256-62. [CrossRef]

31. Fu N, Wang PY, Liu XD, Shen HL. Use of EST-SSR markers for evaluating genetic diversity and fingerprinting Celery (Apium graveolens L.) cultivars. Molecules 2014;19:1939-55. [CrossRef]

32. Ukoskit K, Posudsavang G, Pongsiripat N, Chatwachirawong P, Klomsa-Ard P, Poomipant P, et al. Detection and validation of EST-SSR markers associated with sugar-related traits in sugarcane using linkage and association mapping. Genomics 2018;111:1-9. [CrossRef]

33. Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res 1999;9:868-77. [CrossRef]

34. Liu M, Shi J, Lu C. Identification of stress-responsive genes in Ammopiptanthus mongolicus using ESTs generated from cold-and drought-stressed seedlings. BMC Plant Biol 2013;13:88. [CrossRef]

35. Silva CC, Mantello CC, Campos T, Souza LM, Gonçalves PS, Souza AP. Leaf-, panel-and latex-expressed sequenced tags from the rubber tree (Hevea brasiliensis) under cold-stressed and suboptimal growing conditions: The development of gene-targeted functional markers for stress response. Mol Breed 2014;34:1035-53. [CrossRef]

36. Ronning CM, Stegalkina SS, Ascenzi RA, Bougri O, Hart AL, Utterbach TR, et al. Comparative analyses of potato expressed sequence tag libraries. Plant Physiol 2003;131:419-29. [CrossRef]

37. Garg R, Patel RK, Tyagi AK, Jain M. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res 2011;18:53-63. [CrossRef]

38. Šmarda P, Bureš P, Horová L. The Evolution of Base Composition in Monocots. Brno: Muni Press; 2010.

39. Vinogradov AE. DNA helix: The importance of being GC-rich. Nucleic Acids Res 2003;31:1838-44. [CrossRef]

40. Li XQ, Du D. Variation, evolution, and correlation analysis of C+ G content and genome or chromosome size in different kingdoms and phyla. PLoS One 2014;9:e88339. [CrossRef]

41. Mouchiroud D, D'Onofrio G, Aïssani B, Macaya G, Gautier C, Bernardi G. The distribution of genes in the human genome. Gene 1991;100:181-7. [CrossRef]

42. Amit M, Donyo M, Hollander D, Goren A, Kim E, Gelfman S, et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 2012;1:543-56. [CrossRef]

43. Costantini M, Bernardi G. Replication timing, chromosomal bands, and isochores. Proc Natl Acad Sci 2008;105:3433-7. [CrossRef]

44. Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008;4:e1000071. [CrossRef]

45. Šmarda P, Bureš P. The variation of base composition in plant genomes. In: Plant Genome Diversity. Vol. 1. Berlin, Germany: Springer; 2012. p. 209-35. [CrossRef]

46. Varshney RK, Thiel T, Stein N, Langridge P, Graner A. In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett 2002;7:537-46.

47. Kumpatla SP, Mukhopadhyay S. Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome 2005;48:985-98. [CrossRef]

48. Victoria FC, da Maia LC, de Oliveira AC. In silico comparative analysis of SSR markers in plants. BMC Plant Biol 2011;11:15. [CrossRef]

49. Haq SU, Kumar P, Singh R, Verma KS, Bhatt R, Sharma M, et al. Assessment of functional EST-SSR markers (Sugarcane) in cross-species transferability, genetic diversity among poaceae plants, and bulk segregation analysis. Genet Res Int 2016;2016:16. [CrossRef]

50. Singh RB, Singh B, Singh RK. Development of potential dbEST-derived microsatellite markers for genetic evaluation of sugarcane and related cereal grasses. Ind Crops Prod 2019;128:38-47. [CrossRef]

51. Kejnovsky E, Leitch IJ, Leitch AR. Contrasting evolutionary dynamics between angiosperm and mammalian genomes. Trends Ecol Evol 2009;24:572-82. [CrossRef]

52. Sonah H, Deshmukh RK, Sharma A, Singh VP, Gupta DK, Gacche RN, et al. Genome-wide distribution and organization of microsatellites in plants: An insight into marker development in Brachypodium. PLoS One 2011;6:e21298. [CrossRef]

53. Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, Waugh R. Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics 2000;156:847-54.

54. Yu JK, La Rota M, Kantety R, Sorrells M. EST derived SSR markers for comparative mapping in wheat and rice. Mol Genet Genom 2004;271:742-51. [CrossRef]

55. Cai K, Zhu L, Zhang K, Li L, Zhao Z, Zeng W, et al. Development and characterization of EST-SSR markers from RNA-Seq data in Phyllostachys violascens. Front Plant Sci 2019;10:50. [CrossRef]

56. Sharma H, Kumar P, Singh A, Aggarwal K, Roy J, Sharma V, et al. Development of polymorphic EST-SSR markers and their applicability in genetic diversity evaluation in Rhododendron arboreum. Mol Biol Rep 2020;47:2447-57. [CrossRef]

57. Lawson MJ, Zhang L. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 2006;7:R14. [CrossRef]

58. Shi J, Huang S, Fu D, Yu J, Wang X, Hua W, et al. Evolutionary dynamics of microsatellite distribution in plants: Insight from the comparison of sequenced brassica, Arabidopsis and other angiosperm species. PLoS One 2013;8:e59988. [CrossRef]

59. Haq S, Jain R, Sharma M, Kachhwaha S, Kothari S. Identification and characterization of microsatellites in expressed sequence tags and their cross transferability in different plants. Int J Genom 2014;2014:863948. [CrossRef]

60. Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res 2000;10:72-80.

61. Field D, Wills C. Long, polymorphic microsatellites in simple organisms. Proc R Soc Lond B 1996;263:209-15. [CrossRef]

62. Wren JD, Forgacs E, Fondon JW 3rd, Pertsemlidis A, Cheng SY, Gallardo T, et al. Repeat polymorphisms within gene regions: Phenotypic and evolutionary implications. Am J Hum Genet 2000;67:345-56. [CrossRef]

63. Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 2002;30:194-200. [CrossRef]

64. Liu G, Xie Y, Zhang D, Chen H. Analysis of SSR loci and development of SSR primers in Eucalyptus. J Forestry Res 2018;29:273-82. [CrossRef]

65. Stackelberg M, Rensing SA, Reski R. Identification of genic moss SSR markers and a comparative analysis of twenty-four algal and plant gene indices reveal species-specific rather than group-specific characteristics of microsatellites. BMC Plant Biol 2006;6:9[CrossRef]

66. Maia LC, Souza VQ, Kopp MM, Carvalho FI, Oliveira AC. Tandem repeat distribution of gene transcripts in three plant families. Genet Mol Biol 2009;32:822-33. [CrossRef]

67. Ranade SS, Lin YC, Zuccolo A, Van de Peer Y, García-Gil MR. Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera. BMC Plant Biol 2014;14:220. [CrossRef]

68. Schlötterer C, Tautz D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res 1992;20:211-5. [CrossRef]

69. Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: Structure, function, and evolution. Mol Biol Evol 2004;21:991-1007. [CrossRef]

70. Hosseinzadeh-Colagar A, Haghighatnia MJ, Amiri Z, Mohadjerani M, Tafrihi M. Microsatellite (SSR) amplification by PCR usually led to polymorphic bands: Evidence which shows replication slippage occurs in extend or nascent DNA strands. Mol Biol Res Commun 2016;5:167.

71. Tang S, Okashah RA, Cordonnier-Pratt MM, Pratt LH, Johnson VE, Taylor CA, et al. EST and EST-SSR marker resources for Iris. BMC Plant Biol 2009;9:72. [CrossRef]

72. Kashi Y, King DG. Simple sequence repeats as advantageous mutators in evolution. Trends Genet 2006;22:253-9. [CrossRef]

73. Sathishkumar R, Lakshmi P, Annamalai A, Arunachalam V. Mining of simple sequence repeats in the genome of gentianaceae. Pharmacogn Res 2011;3:19. [CrossRef]

74. Vieira ML, Santini L, Diniz AL, Munhoz CF. Microsatellite markers: What they mean and why they are so useful. Genet Mol Biol 2016;39:312-28. [CrossRef]

75. Jia H, Yang H, Sun P, Li J, Zhang J, Guo Y, et al. De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila. Sci Rep 2016;6:39591. [CrossRef]

76. Mun JH, Kim DJ, Choi HK, Gish J, Debellé F, Mudge J, et al. Distribution of microsatellites in the genome of Medicago truncatula: A resource of genetic markers that integrate genetic and physical maps. Genetics 2006;172:2541-55. [CrossRef]

77. Pandey G, Misra G, Kumari K, Gupta S, Parida SK, Chattopadhyay D, et al. Genome-wide development and use of microsatellite markers for large-scale genotyping applications in foxtail millet (Setaria italica (L.)). DNA Res 2013;20:197-207. [CrossRef]

78. Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): Frequency, length variation, transposon associations, and genetic marker potential. Genome Res 2001;11:1441-52. [CrossRef]

79. Gu T, Tan S, Gou X, Araki H, Tian D. Avoidance of long mononucleotide repeats in codon pair usage. Genetics 2010;186:1077-84. [CrossRef]

80. Kong Q, Xiang C, Yu Z, Zhang C, Liu F, Peng C, et al. Mining and charactering microsatellites in Cucumis melo expressed sequence tags from sequence database. Mol Ecol Notes 2007;7:281-3. [CrossRef]

81. Verma M, Arya L. Development of EST-SSRs in watermelon (Citrullus lanatus var. lanatus) and their transferability to Cucumis spp. J Horticult Sci Biotechnol 2008;83:732. [CrossRef]

82. Liang X, Chen X, Hong Y, Liu H, Zhou G, Li S, et al. Utility of EST-derived SSR in cultivated peanut (Arachis hypogaea L.) and Arachis wild species. BMC Plant Biol 2009;9:35. [CrossRef]

83. Qiu L, Yang C, Tian B, Yang JB, Liu A. Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.). BMC Plant Biol 2010;10:278. [CrossRef]

84. Tong Z, Yang Z, Chen X, Jiao F, Li X, Wu X, et al. Large-scale development of microsatellite markers in Nicotiana tabacum and construction of a genetic map of flue-cured tobacco. Plant Breeding 2012;131:674-80. [CrossRef]

85. Tuskan G, DiFazio S, Teichmann T. Poplar genomics is getting popular: The impact of the poplar genome project on tree research. Plant Biol 2004;6:2-4. [CrossRef]

86. Nagy I, Stágel A, Sasvári Z, Röder M, Ganal M. Development, characterization, and transferability to other Solanaceae of microsatellite markers in pepper (Capsicum annuum L.). Genome 2007;50:668-88. [CrossRef]

87. Schwarzacher T, Zhang Y, Lin Z, Xia Q, Zhang M, Zhang X. Characteristics and analysis of simple sequence repeats in the cotton genome based on a linkage map constructed from a BC1 population between Gossypium hirsutum and G. barbadense. Genome 2008;51:534-46. [CrossRef]

88. Cavagnaro PF, Chung SM, Manin S, Yildiz M, Ali A, Alessandro MS, et al. Microsatellite isolation and marker development in carrot-genomic distribution, linkage mapping, genetic diversity analysis and marker transferability across Apiaceae. BMC Genomics 2011;12:386. [CrossRef]

89. Yang T, Jiang J, Burlyaeva M, Hu J, Coyne CJ, Kumar S, et al. Large-scale microsatellite development in grasspea (Lathyrus sativus L.), an orphan legume of the arid areas. BMC Plant Biol 2014;14:65. [CrossRef]

90. Thiel T, Michalek W, Varshney R, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 2003;106:411-22. [CrossRef]

91. Rota M, Kantety RV, Yu JK, Sorrells ME. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics 2005;6:23. [CrossRef]

92. Gupta P, Rustgi S, Sharma S, Singh R, Kumar N, Balyan H. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat. Mol Genet Genomics 2003;270:315-23. [CrossRef]

93. Ebrahimi A, Mathur S, Lawson SS, LaBonte NR, Lorch A, Coggeshall MV, et al. Microsatellite borders and micro-sequence conservation in Juglans. Sci Rep 2019;9:1-10. [CrossRef]

94. Li YC, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: Genomic distribution, putative functions and mutational mechanisms: A review. Mol Ecol 2002;11:2453-65. [CrossRef]

95. Kumar AS, Sowpati DT, Mishra RK. Single amino acid repeats in the proteome world: Structural, functional, and evolutionary insights. PLoS One 2016;11:e0166854. [CrossRef]

96. Jiang D, Zhong GY, Qi-Bing H. Analysis of microsatellites in citrus unigenes. Acta Genet Sin 2006;33:345-53. [CrossRef]

97. Wang Y, Chen M, Wang H, Wang JF, Bao D. Microsatellites in the genome of the edible mushroom, Volvariella volvacea. BioMed Res Int 2014;2014:281912. [CrossRef]

98. Fu L, Ding Z, Kumpeangkeaw A, Tan D, Han B, Sun X, et al. De novo assembly, transcriptome characterization, and simple sequence repeat marker development in duckweed Lemna gibba. Physiol Mol Biol Plants 2020;26:133-42. https://doi.org/10.1007/s12298-019-00726-9

Reference

1. Kubis S, Schmidt T, Heslop-Harrison JS. Repetitive DNA elements as a major component of plant genomes. Ann Bot 1998;82:45-55. https://doi.org/10.1006/anbo.1998.0779

2. Shapiro JA, Von Sternberg R. Why repetitive DNA is essential to genome function. Biol Rev 2005;80:227-50. https://doi.org/10.1017/S1464793104006657

3. Biscotti MA, Olmo E, Heslop-Harrison JP. Repetitive DNA in Eukaryotic Genomes. Berlin, Germany: Springer; 2015. https://doi.org/10.1007/s10577-015-9499-z

4. Bouck A, Vision T. The molecular ecologist's guide to expressed sequence tags. Mol Ecol 2007;16:907-24. https://doi.org/10.1111/j.1365-294X.2006.03195.x

5. Edwards NJ. Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol Syst Biol 2007;3:102. https://doi.org/10.1038/msb4100142

6. Parkinson J, Blaxter M. Expressed sequence tags: An overview. In: Expressed Sequence Tags (ESTs). Berlin, Germany: Springer; 2009. p. 1-12. https://doi.org/10.1007/978-1-60327-136-3_1

7. Nagaraj SH, Gasser RB, Ranganathan S. A hitchhiker's guide to expressed sequence tag (EST) analysis. Brief Bioinform 2007;8:6-21. https://doi.org/10.1093/bib/bbl015

8. Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S, Claverie JM. Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 1999;9:950-9. https://doi.org/10.1101/gr.9.10.950

9. Prabu G, Mandal A. Computational identification of miRNAs and their target genes from expressed sequence tags of tea (Camellia sinensis). Genom Proteom Bioinform 2010;8:113-21. https://doi.org/10.1016/S1672-0229(10)60012-5

10. Zhang Y, Zhu X, Chen X, Song C, Zou Z, Wang Y, et al. Identification and characterization of cold-responsive microRNAs in tea plant (Camellia sinensis) and their targets using high-throughput sequencing and degradome analysis. BMC Plant Biol 2014;14:271. https://doi.org/10.1186/s12870-014-0271-x

11. Alba R, Payton P, Fei Z, McQuinn R, Debbie P, Martin GB, et al. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. The Plant Cell 2005;17:2954-65. https://doi.org/10.1105/tpc.105.036053

12. Cui G, Huang L, Tang X, Zhao J. Candidate genes involved in tanshinone biosynthesis in hairy roots of Salvia miltiorrhiza revealed by cDNA microarray. Mol Biol Rep 2011;38:2471-8. https://doi.org/10.1007/s11033-010-0383-9

13. Zhou GF, Liu YZ, Sheng O, Wei QJ, Yang CQ, Peng SA. Transcription profiles of boron-deficiency-responsive genes in citrus rootstock root by suppression subtractive hybridization and cDNA microarray. Front Plant Sci 2015;5:795. https://doi.org/10.3389/fpls.2014.00795

14. Baxevanis AD, Ouellette BF. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Vol. 43. United States: John Wiley & Sons; 2004.

15. Hampton M, Xu WW, Kram BW, Chambers EM, Ehrnriter JS, Gralewski JH, et al. Identification of differential gene expression in Brassica rapa nectaries through expressed sequence tag analysis. PLoS One 2010;5:e8782. https://doi.org/10.1371/journal.pone.0008782

16. Sui S, Luo J, Ma J, Zhu Q, Lei X, Li M. Generation and analysis of expressed sequence tags from Chimonanthus praecox (Wintersweet) flowers for discovering stress-responsive and floral developmentrelated genes. Comp Funct Genom 2012;2012:134596. https://doi.org/10.1155/2012/134596

17. Sasaki K, Mitsuda N, Nashima K, Kishimoto K, Katayose Y, Kanamori H, et al. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology. BMC Genom 2017;18:683. https://doi.org/10.1186/s12864-017-4061-3

18. Morgante M, Olivieri A. PCR?amplified microsatellites as markers in plant genetics. The Plant J 1993;3:175-82. https://doi.org/10.1046/j.1365-313X.1993.t01-9-00999.x

19. Jurka J, Pethiyagoda C. Simple repetitive DNA sequences from primates: Compilation and analysis. J Mol Evol 1995;40:120-6. 20. Tóth G, Gáspári Z, Jurka J. Microsatellites in different eukaryotic genomes: Survey and analysis. Genome Res 2000;10:967-81. https://doi.org/10.1101/gr.10.7.967

21. Ellegren H. Microsatellites: Simple sequences with complex evolution. Nat Rev Genet 2004;5:435. https://doi.org/10.1038/nrg1348

22. Agarwal M, Shrivastava N, Padh H. Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Rep 2008;27:617-31. https://doi.org/10.1007/s00299-008-0507-z

23. Masouleh AK, Waters DL, Reinke RF, Henry RJ. A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF mass spectrometry. Plant Biotechnol J 2009;7:355-63. https://doi.org/10.1111/j.1467-7652.2009.00411.x

24. Oliveira EJ, Pádua JG, Zucchi MI, Vencovsky R, Vieira ML. Origin, evolution and genome distribution of microsatellites. Genet Mol Biol 2006;29:294-307. https://doi.org/10.1590/S1415-47572006000200018

25. Heywood VH, Iriondo JM. Plant conservation: Old problems, new perspectives. Biol Conserv 2003;113:321-35. https://doi.org/10.1016/S0006-3207(03)00121-6

26. Cordeiro GM, Casu R, McIntyre CL, Manners JM, Henry RJ. Microsatellite markers from sugarcane (Saccharum spp.) ESTs cross transferable to erianthus and sorghum. Plant Sci 2001;160:1115-23. https://doi.org/10.1016/S0168-9452(01)00365-X

27. Kantety RV, La Rota M, Matthews DE, Sorrells ME. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol Biol 2002;48:501-10. https://doi.org/10.1023/A:1014875206165

28. Eujayl I, Sledge M, Wang L, May G, Chekhovskiy K, Zwonitzer J, et al. Medicago truncatula EST-SSRs reveal cross-species genetic markers for Medicago spp. Theor Appl Genet 2004;108:414-22. https://doi.org/10.1007/s00122-003-1450-6

29. Varshney RK, Chabane K, Hendre PS, Aggarwal RK, Graner A. Comparative assessment of EST-SSR, EST-SNP and AFLP markers for evaluation of genetic diversity and conservation of genetic resources using wild, cultivated and elite barleys. Plant Sci 2007;173:638-49. https://doi.org/10.1016/j.plantsci.2007.08.010

30. Simko I. Development of EST-SSR markers for the study of population structure in lettuce (Lactuca sativa L.). J Hered 2009;100:256-62. https://doi.org/10.1093/jhered/esn072

31. Fu N, Wang PY, Liu XD, Shen HL. Use of EST-SSR markers for evaluating genetic diversity and fingerprinting Celery (Apium graveolens L.) cultivars. Molecules 2014;19:1939-55. https://doi.org/10.3390/molecules19021939

32. Ukoskit K, Posudsavang G, Pongsiripat N, Chatwachirawong P, Klomsa-Ard P, Poomipant P, et al. Detection and validation of ESTSSR markers associated with sugar-related traits in sugarcane using linkage and association mapping. Genomics 2018;111:1-9. https://doi.org/10.1016/j.ygeno.2018.03.019

33. Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res 1999;9:868-77. https://doi.org/10.1101/gr.9.9.868

34. Liu M, Shi J, Lu C. Identification of stress-responsive genes in Ammopiptanthus mongolicus using ESTs generated from cold-and drought-stressed seedlings. BMC Plant Biol 2013;13:88. https://doi.org/10.1186/1471-2229-13-88

35. Silva CC, Mantello CC, Campos T, Souza LM, Gonçalves PS, Souza AP. Leaf-, panel-and latex-expressed sequenced tags from the rubber tree (Hevea brasiliensis) under cold-stressed and suboptimal growing conditions: The development of gene-targeted functional markers for stress response. Mol Breed 2014;34:1035-53. https://doi.org/10.1007/s11032-014-0095-2

36. Ronning CM, Stegalkina SS, Ascenzi RA, Bougri O, Hart AL, Utterbach TR, et al. Comparative analyses of potato expressed sequence tag libraries. Plant Physiol 2003;131:419-29. https://doi.org/10.1104/pp.013581

37. Garg R, Patel RK, Tyagi AK, Jain M. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res 2011;18:53-63. https://doi.org/10.1093/dnares/dsq028

38. Šmarda P, Bureš P, Horová L. The Evolution of Base Composition in Monocots. Brno: Muni Press; 2010.

39. Vinogradov AE. DNA helix: The importance of being GC-rich. Nucleic Acids Res 2003;31:1838-44. https://doi.org/10.1093/nar/gkg296

40. Li XQ, Du D. Variation, evolution, and correlation analysis of C+ G content and genome or chromosome size in different kingdoms and phyla. PLoS One 2014;9:e88339. https://doi.org/10.1371/journal.pone.0088339

41. Mouchiroud D, D'Onofrio G, Aïssani B, Macaya G, Gautier C, Bernardi G. The distribution of genes in the human genome. Gene 1991;100:181-7. https://doi.org/10.1016/0378-1119(91)90364-H

42. Amit M, Donyo M, Hollander D, Goren A, Kim E, Gelfman S, et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 2012;1:543-56. https://doi.org/10.1016/j.celrep.2012.03.013

43. Costantini M, Bernardi G. Replication timing, chromosomal bands, and isochores. Proc Natl Acad Sci 2008;105:3433-7. 44. Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008;4:e1000071. https://doi.org/10.1371/journal.pgen.1000071

45. Šmarda P, Bureš P. The variation of base composition in plant genomes. In: Plant Genome Diversity. Vol. 1. Berlin, Germany: Springer; 2012. p. 209-35. https://doi.org/10.1007/978-3-7091-1130-7_14

46. Varshney RK, Thiel T, Stein N, Langridge P, Graner A. In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett 2002;7:537-46.

47. Kumpatla SP, Mukhopadhyay S. Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome 2005;48:985-98. https://doi.org/10.1139/g05-060

48. Victoria FC, da Maia LC, de Oliveira AC. In silico comparative analysis of SSR markers in plants. BMC Plant Biol 2011;11:15. https://doi.org/10.1186/1471-2229-11-15

49. Haq SU, Kumar P, Singh R, Verma KS, Bhatt R, Sharma M, et al. Assessment of functional EST-SSR markers (Sugarcane) in crossspecies transferability, genetic diversity among poaceae plants, and bulk segregation analysis. Genet Res Int 2016;2016:16. https://doi.org/10.1155/2016/7052323

50. Singh RB, Singh B, Singh RK. Development of potential dbESTderived microsatellite markers for genetic evaluation of sugarcane and related cereal grasses. Ind Crops Prod 2019;128:38-47. https://doi.org/10.1016/j.indcrop.2018.10.071

51. Kejnovsky E, Leitch IJ, Leitch AR. Contrasting evolutionary dynamics between angiosperm and mammalian genomes. Trends Ecol Evol 2009;24:572-82. https://doi.org/10.1016/j.tree.2009.04.010

52. Sonah H, Deshmukh RK, Sharma A, Singh VP, Gupta DK, Gacche RN, et al. Genome-wide distribution and organization of microsatellites in plants: An insight into marker development in Brachypodium. PLoS One 2011;6:e21298. https://doi.org/10.1371/journal.pone.0021298

53. Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, Waugh R. Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics 2000;156:847-54.

54. Yu JK, La Rota M, Kantety R, Sorrells M. EST derived SSR markers for comparative mapping in wheat and rice. Mol Genet Genom 2004;271:742-51. https://doi.org/10.1007/s00438-004-1027-3

55. Cai K, Zhu L, Zhang K, Li L, Zhao Z, Zeng W, et al. Development and characterization of EST-SSR markers from RNA-Seq data in Phyllostachys violascens. Front Plant Sci 2019;10:50. https://doi.org/10.3389/fpls.2019.00050

56. Sharma H, Kumar P, Singh A, Aggarwal K, Roy J, Sharma V, et al. Development of polymorphic EST-SSR markers and their applicability in genetic diversity evaluation in Rhododendron arboreum. Mol Biol Rep 2020;47:2447-57. https://doi.org/10.1007/s11033-020-05300-1

57. Lawson MJ, Zhang L. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 2006;7:R14. https://doi.org/10.1186/gb-2006-7-2-r14

58. Shi J, Huang S, Fu D, Yu J, Wang X, Hua W, et al. Evolutionary dynamics of microsatellite distribution in plants: Insight from the comparison of sequenced brassica, Arabidopsis and other angiosperm species. PLoS One 2013;8:e59988. https://doi.org/10.1371/journal.pone.0059988

59. Haq S, Jain R, Sharma M, Kachhwaha S, Kothari S. Identification and characterization of microsatellites in expressed sequence tags and their cross transferability in different plants. Int J Genom 2014;2014:863948. https://doi.org/10.1155/2014/863948

60. Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res 2000;10:72-80.

61. Field D, Wills C. Long, polymorphic microsatellites in simple organisms. Proc R Soc Lond B 1996;263:209-15. https://doi.org/10.1098/rspb.1996.0033

62. Wren JD, Forgacs E, Fondon JW 3rd, Pertsemlidis A, Cheng SY, Gallardo T, et al. Repeat polymorphisms within gene regions: Phenotypic and evolutionary implications. Am J Hum Genet 2000;67:345-56. https://doi.org/10.1086/303013

63. Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 2002;30:194-200. https://doi.org/10.1038/ng822

64. Liu G, Xie Y, Zhang D, Chen H. Analysis of SSR loci and development of SSR primers in Eucalyptus. J Forestry Res 2018;29:273-82. https://doi.org/10.1007/s11676-017-0434-3

65. Stackelberg M, Rensing SA, Reski R. Identification of genic moss SSR markers and a comparative analysis of twenty-four algal and plant gene indices reveal species-specific rather than group-specific characteristics of microsatellites. BMC Plant Biol 2006;6:9 https://doi.org/10.1186/1471-2229-6-9

66. Maia LC, Souza VQ, Kopp MM, Carvalho FI, Oliveira AC. Tandem repeat distribution of gene transcripts in three plant families. Genet Mol Biol 2009;32:822-33. https://doi.org/10.1590/S1415-47572009005000091

67. Ranade SS, Lin YC, Zuccolo A, Van de Peer Y, García-Gil MR. Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera. BMC Plant Biol 2014;14:220. https://doi.org/10.1186/s12870-014-0220-8

68. Schlötterer C, Tautz D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res 1992;20:211-5. https://doi.org/10.1093/nar/20.2.211

69. Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: Structure, function, and evolution. Mol Biol Evol 2004;21:991-1007. https://doi.org/10.1093/molbev/msh073

70. Hosseinzadeh-Colagar A, Haghighatnia MJ, Amiri Z, Mohadjerani M, Tafrihi M. Microsatellite (SSR) amplification by PCR usually led to polymorphic bands: Evidence which shows replication slippage occurs in extend or nascent DNA strands. Mol Biol Res Commun 2016;5:167.

71. Tang S, Okashah RA, Cordonnier-Pratt MM, Pratt LH, Johnson VE, Taylor CA, et al. EST and EST-SSR marker resources for Iris. BMC Plant Biol 2009;9:72. https://doi.org/10.1186/1471-2229-9-72

72. Kashi Y, King DG. Simple sequence repeats as advantageous mutators in evolution. Trends Genet 2006;22:253-9. https://doi.org/10.1016/j.tig.2006.03.005

73. Sathishkumar R, Lakshmi P, Annamalai A, Arunachalam V. Mining of simple sequence repeats in the genome of gentianaceae. Pharmacogn Res 2011;3:19. https://doi.org/10.4103/0974-8490.79111

74. Vieira ML, Santini L, Diniz AL, Munhoz CF. Microsatellite markers: What they mean and why they are so useful. Genet Mol Biol 2016;39:312-28. https://doi.org/10.1590/1678-4685-GMB-2016-0027

75. Jia H, Yang H, Sun P, Li J, Zhang J, Guo Y, et al. De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila. Sci Rep 2016;6:39591. https://doi.org/10.1038/srep39591

76. Mun JH, Kim DJ, Choi HK, Gish J, Debellé F, Mudge J, et al. Distribution of microsatellites in the genome of Medicago truncatula: A resource of genetic markers that integrate genetic and physical maps. Genetics 2006;172:2541-55. https://doi.org/10.1534/genetics.105.054791

77. Pandey G, Misra G, Kumari K, Gupta S, Parida SK, Chattopadhyay D, et al. Genome-wide development and use of microsatellite markers for large-scale genotyping applications in foxtail millet (Setaria italica (L.)). DNA Res 2013;20:197-207. https://doi.org/10.1093/dnares/dst002

78. Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): Frequency, length variation, transposon associations, and genetic marker potential. Genome Res 2001;11:1441-52. https://doi.org/10.1101/gr.184001

79. Gu T, Tan S, Gou X, Araki H, Tian D. Avoidance of long mononucleotide repeats in codon pair usage. Genetics 2010;186:1077-84. https://doi.org/10.1534/genetics.110.121137

80. Kong Q, Xiang C, Yu Z, Zhang C, Liu F, Peng C, et al. Mining and charactering microsatellites in Cucumis melo expressed sequence tags from sequence database. Mol Ecol Notes 2007;7:281-3. https://doi.org/10.1111/j.1471-8286.2006.01580.x

81. Verpopular: The impact of the poplar genome project on tree research. Plant Biol 2004;6:2-4. https://doi.org/10.1055/s-2003-44715

86. Nagy I, Stágel A, Sasvári Z, Röder M, Ganal M. Development, characterization, and transferability to other Solanaceae of microsatellite markers in pepper (Capsicum annuum L.). Genome 2007;50:668-88. https://doi.org/10.1139/G07-047

87. Schwarzacher T, Zhang Y, Lin Z, Xia Q, Zhang M, Zhang X. Characteristics and analysis of simple sequence repeats in the cotton genome based on a linkage map constructed from a BC1 population between Gossypium hirsutum and G. barbadense. Genome 2008;51:534-46. https://doi.org/10.1139/G08-033

88. Cavagnaro PF, Chung SM, Manin S, Yildiz M, Ali A, Alessandro MS, et al. Microsatellite isolation and marker development in carrotgenomic distribution, linkage mapping, genetic diversity analysis and marker transferability across Apiaceae. BMC Genomics 2011;12:386. https://doi.org/10.1186/1471-2164-12-386

89. Yang T, Jiang J, Burlyaeva M, Hu J, Coyne CJ, Kumar S, et al. Largescale microsatellite development in grasspea (Lathyrus sativus L.), an orphan legume of the arid areas. BMC Plant Biol 2014;14:65. https://doi.org/10.1186/1471-2229-14-65

90. Thiel T, Michalek W, Varshney R, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 2003;106:411-22. https://doi.org/10.1007/s00122-002-1031-0

91. Rota M, Kantety RV, Yu JK, Sorrells ME. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics 2005;6:23. https://doi.org/10.1186/1471-2164-6-23

92. Gupta P, Rustgi S, Sharma S, Singh R, Kumar N, Balyan H. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat. Mol Genet Genomics 2003;270:315-23. https://doi.org/10.1007/s00438-003-0921-4

93. Ebrahimi A, Mathur S, Lawson SS, LaBonte NR, Lorch A, Coggeshall MV, et al. Microsatellite borders and micro-sequence conservation in Juglans. Sci Rep 2019;9:1-10. https://doi.org/10.1038/s41598-019-39793-z

94. Li YC, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: Genomic distribution, putative functions and mutational mechanisms: A review. Mol Ecol 2002;11:2453-65. https://doi.org/10.1046/j.1365-294X.2002.01643.x

95. Kumar AS, Sowpati DT, Mishra RK. Single amino acid repeats in the proteome world: Structural, functional, and evolutionary insights. PLoS One 2016;11:e0166854. https://doi.org/10.1371/journal.pone.0166854

96. Jiang D, Zhong GY, Qi-Bing H. Analysis of microsatellites in citrus unigenes. Acta Genet Sin 2006;33:345-53. https://doi.org/10.1016/S0379-4172(06)60060-7

97. Wang Y, Chen M, Wang H, Wang JF, Bao D. Microsatellites in the genome of the edible mushroom, Volvariella volvacea. BioMed Res Int 2014;2014:281912. https://doi.org/10.1155/2014/281912

98. Fu L, Ding Z, Kumpeangkeaw A, Tan D, Han B, Sun X, et al. De novo assembly, transcriptome characterization, and simple sequence repeat marker development in duckweed Lemna gibba. Physiol Mol Biol Plants 2020;26:133-42. https://doi.org/10.1007/s12298-019-00726-9

Article Metrics
44 Views 144 Downloads 188 Total

Year

Month

Related Search

By author names