- © 1999 American Society of Plant Physiologists
Abstract
The replicative retrotransposon life cycle offers the potential for explosive increases in copy number and consequent inflation of genome size. The BARE-1 retrotransposon family of barley is conserved, disperse, and transcriptionally active. To assess the role of BARE-1 in genome evolution, we determined the copy number of its integrase, its reverse transcriptase, and its long terminal repeat (LTR) domains throughout the genus Hordeum. On average, BARE-1 contributes 13.7 × 103 full-length copies, amounting to 2.9% of the genome. The number increases with genome size. Two LTRs are associated with each internal domain in intact retrotransposons, but surprisingly, BARE-1 LTRs were considerably more prevalent than would be expected from the numbers of intact elements. The excess in LTRs increases as both genome size and BARE-1 genomic fraction decrease. Intrachromosomal homologous recombination between LTRs could explain the excess, removing BARE-1 elements and leaving behind solo LTRs, thereby reducing the complement of functional retrotransposons in the genome and providing at least a partial “return ticket from genomic obesity.”
INTRODUCTION
Retrotransposons are genomic elements exhibiting a structure and life cycle very similar to that of the retroviruses (Adams et al., 1987; Doolittle et al., 1989; Grandbastien, 1992). Both of the two main classes of retrotransposons containing long terminal repeats (LTRs), the copia-like and the romani- or gypsy-like elements, appear to be ubiquitous in the vascular plants (Flavell et al., 1992; Voytas et al., 1992; Suoniemi et al., 1998b). Unlike DNA transposons, retrotransposons replicate by transcription of genomic copies followed by reverse transcription and ultimate integration of the cDNA copy back into the genome (Boeke and Chapman, 1991). The replication strategy of retrotransposons offers the potential for explosive increases in copy number, were each new copy to generate many transcripts that would be integrated in turn as cDNA. The process, if not directly lethal due to insertional mutagenesis, would lead to a concomitant genome size increase as the number of retrotransposons increased. In plants, such increases would be heritable if they occurred in cells giving rise to pollen and ovules.
Over a broad range of organisms, retrotransposon copy number appears to be correlated with genome size. The small genome of the yeast Saccharomyces cerevisiae, consisting of 13 × 106 bp (http://www.mips.biochem.mpg.de/proj/yeast/tables/inventy.html), contains 51 full-length retrotransposons (Kim et al., 1998). The genome size of plants may be considered as the DNA content (1C) of the unreplicated haploid (1n) set of chromosomes (1x). Thus, a somatic nucleus of a diploid is 2n with 2x chromosomes containing 2C of DNA; for a hexaploid, the somatic nucleus also contains 2C of DNA but is 2n and 6x. Arabidopsis has the smallest plant genome known, consisting of 108 bp (Goodman et al., 1995). Approximately 30 retrotransposon families have been identified in Arabidopsis to date, each generally with one to three members (D. Voytas, personal communication). The fairly compact genome of rice comprises 4.3 × 108 bp (Kurata et al., 1997) and contains some 10 3 retrotransposons, estimated with a highly conserved probe (Hirochika et al., 1992). The large (1C of 3.2 × 1010 bp) genome of onion (genome sizes, unless otherwise cited, are from http://www.rbgkew.org.uk/cval/database1.html) contains 1 to 2 × 105 copia-like retroelements (Pearce et al., 1996b), whereas within the very large genomes of the genus Lilium (1C of ∼1.4 × 1011), the gypsy-like del family of elements alone numbers 106 in some species.
Within the cereals of the tribe Triticeae, rye (1C of 3.8 × 1010 bp) has ∼105 copia-like elements comprising 3.5% of the genome, with similarly high numbers in wheat, oats, and barley (Pearce et al., 1997). In cultivated barley (1C of 4.8 × 109), BARE-1 (for barley retroelement 1) forms a major and active family of copia-like retroelements, dispersed on all chromosomes (Suoniemi et al., 1996a). We have demonstrated that BARE-1 is transcribed in somatic tissues (Suoniemi et al., 1996b) from one of two promoters within well-conserved LTRs (Suoniemi et al., 1997) and furthermore contains functionally conserved reverse transcriptase (RT) priming sites (Suoniemi et al., 1997) and coding domains for GAG (putative capsid protein), aspartic proteinase, integrase (IN; Suoniemi et al., 1998a), and RT. Very few other plant retrotransposons have been demonstrated to be transcriptionally active (Hirochika, 1993; Lucas et al., 1995; Royo et al., 1996; Vernhettes et al., 1997; Takeda et al., 1998).
The genus Hordeum is widely distributed in both hemispheres, containing some 33 species and 46 taxa (von Bothmer et al., 1995) divided into meiotic-pairing groups designated the H, I, X, and Y genomes (Jacobsen and von Bothmer, 1992). The 1C genome sizes within the genus have been found to vary considerably, from 2.7 to 4.5 × 109 bp for the diploid species and up to 8.9 × 109 bp for the tetraploids (Kankaanpää et al., 1996). In view of the prevalence, conservation, and activity of BARE-1 in barley and given the size of the genus, examination of BARE-1 copy number in Hordeum spp may shed light on the role of the retrotransposon in genome evolution.
RESULTS
Strategy for Determining Copy Number
Earlier examinations of retrotransposon copy number (Joseph et al., 1990; Vershinin et al., 1990) made clear that apparent copy number depends on hybridization stringency, as expected if the genome contains elements differing in their divergence from the cloned type copy. All probes used here were subcloned from the type element BARE-1a (GenBank accession number Z17327) isolated from barley cultivar Bomi, schematically represented in Figure 1A. Hence, the genetic distance from barley might generate artifactually low copy number estimates in other Hordeum spp investigated.
To address this issue, we determined copy number with several distinct BARE-1a probes. One probe comprised part of the LTR upstream of the transcriptional start (Figure 1C). This segment includes transcriptional regulatory regions (Suoniemi et al., 1996b) and is well conserved (>90% identity) within barley and between barley and wheat (Schmidt and Graner, 1994; Suoniemi et al., 1997). The other two probes, for rt and in (Figure 1C), corresponded to regions encoding enzymes critical to the retrotransposon life cycle and would be expected to be well conserved. All pairwise comparisons between 10 aligned BARE-1 rt domains (Gribbon et al., 1999; B. Gribbon, J. Tanskanen, A.H. Schulman, and A. Flavell, unpublished data) display an 86.4 ± 0.7% (SE) similarity. The Wis-2 retrotransposon family of wheat displays 2.4 to 12.7 × 10-8 substitutions per nucleotide per year (Matsuoka and Tsunewaki, 1996), and cotton rt displays a rate of 0.16 to 1.5 × 10-8. We have estimated a comparable rate of 1.42 × 10-8 between the in domains of H. spontaneum and Wis-2-1a (Suoniemi et al., 1998a).
Data were collected from slot blot hybridizations. The washing conditions (final wash of 20 min in 0.2 × SSC at 65°C, where 1 × SSC is 0.15 M NaCl and 0.015 M sodium citrate) are 10°C below the calculated ∼75°C melting temperature of the probes (Meinkoth and Wahl, 1984), permitting a mismatch of ∼5 to 15% (Hyman et al., 1973). Less stringent washes (2 × SSC at 65°C) 25°C below the melting temperature did not increase the hybridization signal with the in probe in tests for barley, H. murinum, H. euclaston, and H. pusillum, and so the more stringent conditions were used for copy number determinations.
For interspecies comparisons, variations in the amounts of DNA on the filters bound and accessible to the probes were controlled for by hybridization of labeled barley total DNA. The control stringency was 18°C below the 83°C melting temperature calculated for barley DNA. Taking the synonymous substitution rate as 2 to 7 × 10-9 within the genus Hordeum and other grasses (Zhou et al., 1995; Gaut et al., 1996), the wheat genus Triticum as the nearest outgroup for Hordeum, and a divergence time of 9 × 106 years between the two (Ogihara et al., 1991; Ikeda et al., 1992) yields an estimated maximum of 6.3% substitutions per position between species in these genera. By comparison, the internal transcribed spacer of barley rDNA has diverged 10.5% from that of H-genome species H. californicum (Hsiao et al., 1995). Hence, the washing conditions should not have distorted the results for Hordeum spp genetically distant from barley. A set of controls was made to correct for differences in loading and hybridization accessibility of the blotted samples. In these controls, λ phage DNA was added to the extracted genomic DNAs at a copy number similar to that of the BARE-1 elements before individual sample preparation. These experiments yielded results similar to those obtained by using total barley DNA.
BARE-1 Elements Are Conserved in Length in Hordeum
Calculation of the fraction of the genome occupied by a retrotransposon family requires knowledge of copy number, genome size, and retrotransposon size. BARE-1a is 12.09 kb but contains a 3.14-kb insert in the 3′ LTR. Thus, without the insert, the BARE-1a element is 8932 bp in length, and earlier studies have indicated that BARE-1 elements in barley are of this length (Suoniemi et al., 1996a), compared with 5334 bp for the Tnt-1 element of tobacco (GenBank accession X13777). To examine the size of the BARE-1 family in the genus Hordeum, we used primers to amplify the elements’ LTRs and internal domains (Figure 1B) from genomic DNA. Figure 2A shows that throughout the genus, the BARE-1 internal domain is conserved in length, the polymerase chain reactions (PCRs) yielding fragments of an estimated 5890 bp compared with the expected 5750 bp. The amplified LTR bands (Figure 2B) estimated at 1928 bp also match the predicted size of the BARE-1a LTRs (1809 bp), although one reaction (sample 26, H. depressum) also produced slightly shorter products. The conservation is significant in view of the unusually long untranslated leader (1.7 kb) and LTRs in BARE-1a.
Organization of BARE-1 and the Probes and Products Analyzed.
(A) Canonical BARE-1 element. The LTRs are shown as hatched boxes and the segment of untranslated leader (UTL) outside the LTR as a gray bar. The protein-coding domains (arrows) include the putative capsid protein (GAG), aspartic proteinase (AP), integrase (IN), and reverse transcriptase–RNase H (RT-RH).
(B) The positions of the PCR products used in sequence and length divergence analyses. The inward-facing arrowheads diagrammatically represent the primers and are not drawn to scale.
(C) Probes used in filter and in situ hybridizations. The probes are positioned under the restriction sites shown on the map in (A); these sites were used to subclone the corresponding segments. The gag probe is indicated as in (B) because it was a PCR product.
(D) Restriction digests used in analysis of bacterial artificial chromosome (BAC) clones. The fragments predicted from BARE-1a are shown as shaded boxes; arrows indicate extension of a fragment into the flanking genomic region. Dotted lines indicate hybridization coverage of the probes in (C).
The in and rt Regions as Estimators of BARE-1 Copy Number
Estimates of BARE-1 copy number per haploid genome equivalent obtained by using the LTR, in, and rt probes are presented in Table 1. The number of in copies per genome ranged from 4.3 to 19 × 103, and rt copies ranged from 4.5 to 30 × 103. Overall, the numbers of in and rt copies were very tightly correlated, the Pearson product moment correlation analysis yielding a coefficient (rP) of 0.885 with a probability of error (P) of 2 × 10-10. This would be expected because in and rt are adjacent components of the retrotransposons’ genomes. Regarding the broad groups of species analyzed, the diploid I genome accessions, H. spontaneum and barley, showed 12 to 24 × 103 rt copies and 13 to 19 × 103 in copies per genome equivalent. By contrast, the basic genome (1x; 0.5n; 0.5C) of the autotetraploid H. bulbosum (II genome) contained only 3.1 × 103 (rt) to 5 × 103 (in) copies of BARE-1. The H. murinum (YY genome) tetraploid accessions contained 12 to 18 × 103 BARE-1 copies, or 6 to 9 × 103 for the basic genome. The sole X genome representative, H. marinum subsp gussoneanum, had 8 to 10 × 103 copies of BARE-1. The greatest range of in and rt copy numbers was found in the American H genome accessions. In this set, 4.5 to 21 × 103 rt and 4.3 to 17 × 103 in copies per genome were detected. The two HH tetraploids, H. depressum and H. jubatum, contained 14 to 15 × 103 rt and 7 to 9 × 103 in copies in their basic genomes. For the genus as a whole, the rt probe detected 1.5 ± 0.1 × 104 copies and the in probe detected 1.3 ± 0.09 × 104 copies per genome.
BARE-1 Is Conserved in Length throughout the Genus Hordeum.
(A) PCR amplification of BARE-1 internal domains.
(B) PCR amplification of LTRs.
Samples are numbered as in Table 1.
We tested the reliability of the in and rt probes as estimators of BARE-1 copy number by correlation analysis against genome size and genome type. The I genome species are most closely related to barley, and the species with Y, H, and X genomes are thought to follow in that order at increasing distance (Svitashev et al., 1994; Marillia and Scoles, 1996). A rapidly evolving BARE-1 region might be expected to show an apparent copy number that decreases with increasing genetic distance from barley and therefore might be a bad estimator of true copy number. Applicability of this criterion would be limited if not only BARE-1 copy number but also genome size were correlated with genetic distance from barley. Among the diploids, genome size was indeed negatively correlated with distance from barley (rP = -0.593, P = 0.003), and the I and H genome accessions as a whole differed significantly in their genome sizes (Student’s t test, P = 0.009). However, the correlation between genome type and in copy number was on the same order as for genome size (rP = -0.551, P = 0.002) and not significant for rt. Furthermore, several H genome accessions showed as many copies of in and rt as seen in barley. We interpret these data to show that the in and rt probes cannot be said to give artifactually low copy numbers for genomes of Hordeum spp that are distant from barley.
LTRs Are in Excess in Genomes of the Genus Hordeum
Each retrotransposon contains two LTRs, yielding an expected copy number ratio of 2:1 with respect to other regions such as in or rt. However, the measured number of LTRs (Table 1) was considerably higher than that of any other probe, ranging over 8.7 to 40.7 × 104(C)-1. This indicates that genomes of Hordeum spp respectively contain 5.1 to 50 (average 14.3 ± 1.8) and 6.8 to 42 (average 16.5 ± 1.8) times more LTR than rt or in copies. These data are consistent with earlier observations. For barley cultivar Bomi, 9 ± 0.6 × 104 copies is comparable to 5.9 × 104 previously estimated for the same cultivar (assuming a C of 4.53 pg) with a higher stringency wash (0.1 × SSC, 65°C) (Manninen and Schulman, 1993) and with the 5.6 to 12.6 × 104 copies found, depending on the wash stringency, for a different cultivar (Vershinin et al., 1990).
Aside from the support of these earlier results, additional confirmation of the LTR excess was sought by several means. A set of dot blot hybridizations, in which the loading was controlled by using λ phage DNA, was performed, with nine individuals each from six stands of H. spontaneum growing in a single canyon at most 300 m from each other (Lower Nahal Oren, Mt. Carmel, Israel). Summing over the entire data set, an average of 15.2 ± 0.4 × 103 (SE) in copies were detected per genome equivalent, in the middle of the range seen in the broader slot blot data set. For the LTRs, the copy number was 81 ± 2 × 103 (range of 37 to 118 × 103), and the LTR/in ratio was 3.8 to 7.6, in the range seen for several other barleys and H. spontaneum in the larger data set (Table 1). The LTR copy number in general shows much greater variation in H. spontaneum than do in or rt numbers, which may have relevance for the mechanisms affecting their relative abundance (see Discussion).
Sequence Divergence Does Not Account for Apparent LTR Abundance
A trivial explanation for the differences in LTR abundance seen among Hordeum spp would be that LTRs are differentially more conserved than in or rt in species with high apparent LTR excess. To examine this, we first amplified a set of LTR segments by PCR under low primer annealing stringency from five species with varying degrees of LTR excess: barley cultivar Bomi (LTR/in = 6.8), H. pusillum (19.9), H. euclaston (22.4), H. roshevitzii (30.0), and H. marinum subsp gussoneanum (41.9). From each species, genomic DNA and the pooled reaction product (a single band) from the LTR amplification were each digested with DpnII, blotted, and hybridized with the LTR probe used against the slot blots discussed above (data not shown). The similarity in digestion pattern and response between the genomic DNA and PCR products, digested with seven tetranucleotide-recognizing restriction enzymes, demonstrated that the PCR amplifications were representative of the LTRs present in the genome as a whole. The PCR products were cloned, and sequences of 19 clones from the five species were determined (GenBank accession numbers Y18767 to Y18785), aligned, and compared.
BARE-1 Copy Number and Genome Share in Hordeum spp
The average sequence identity among and between clones from these five species for the LTR region and an in region (GenBank accession numbers Z80000 to Z80079) previously used in an extensive molecular evolution study (Suoniemi et al., 1998a) is shown in Table 2. The overall level of identity among LTR and in sequences is not significantly different (t test), whereas the specific level of identity for the LTR and in for any two-group comparison is highly correlated (Pearson; P = 0.01). For both in and LTR sequences, barley sequences are ∼10% more similar to each other than to those from other Hordeum spp. There was no systematic difference between the LTRs and in for any of the nonbarley species investigated. The data thus indicate that sequence divergence and its effect on hybridization cannot generate the high apparent LTR excesses. The intensity of the PCR amplification of BARE-1 LTRs (Figure 2B) mirrors the relative excess of LTRs seen in the slot blot hybridizations, with strong amplification from H. roshevitzii (sample 24) and H. marinum subsp gussoneum (sample 28) and weak amplification from H. muticum (sample 21) and H. patigonicum subsp santacrucense (sample 23).
Average Sequence Identity between Regions of BARE-1 LTRs and in Domains
Barley Bacterial Artificial Chromosome Clones Contain Solo LTRs
To more directly address the association of BARE-1 LTRs and internal domains, we analyzed a set of clones from a barley genomic library prepared in a bacterial artificial chromosome (BAC) vector and containing large inserts (kindly screened and given by A. Druka and A. Kleinhofs, Washington State University, Pullman). The library was constructed from barley cultivar Morex. Twenty clones were analyzed by diagnostic digestion with eight restriction enzymes or combinations thereof (Figure 1D) followed by DNA gel blotting and then hybridization with LTR, gag, and in probes (Figure 1C). The enzymes were chosen to generate from BARE-1 either internal fragments, from one or both LTRs or from the coding domain, or fragments extending into the flanking DNA. This allowed both confirmation of the presence and an estimate of the numbers of LTRs and coding domains. Based on addition of band lengths from the digests, an estimated total of 1.35 Mb of genomic DNA was thereby assayed, as tabulated by clone in Table 3.
One part of these analyses, a BamHI digest probed for the LTR and in, is shown in Figure 3. In this digest (Figure 3A), a single site in the rt region is predicted for BARE-1a, generating two fragments extending into the flanking DNA. Each fragment contains an LTR but only one in domain (Figure 1D). A cryptic site for BamHI was revealed by the generation of internal in fragments in four of the clones (Figure 3C). Inspection of the BARE-1a sequence indicates a site at nucleotide 764 differing from the BamHI recognition motif by 1 bp and an additional nine sites between nucleotides 750 and 850 differing by 2 bp. All of the clones except one contained at least one LTR (Figure 1B). The digests and hybridizations are summarized in Table 3; 10 clones had solo LTRs with no internal domains, six had an excess of LTRs over internal domains, three had fewer than 2:1 LTRs per internal domain (as might occur in a BARE-1 cut short by the edge of the insert), and one clone had the 2:1 ratio. An overall ratio of 4.5 LTR/in and 4.5 LTR/gag was found by pooling the data for the 20 clones. In these analyses, nested insertions of LTRs within LTRs would not have been detected, leading to a possible underestimation of the excess of LTRs.
In Situ Hybridizations Reflect BARE-1 Copy Number Variations
Hybridizations with in and rt probes were made to metaphase chromosomes of barley, H. euclaston, and H. pusillum to investigate the distribution and relative size of the BARE-1 family in these species. Although it is difficult to make in situ hybridization conditions identical among chromosome mounts so that results can be quantitatively compared, the images in Figure 4 are representative of the hundred or so cells examined for each hybridization and of the two to five independent hybridizations made for each species and probe combination. The chromosomes of barley (Figures 4A to 4C) appear larger than those of either H. euclaston (Figures 4D to 4F) or H. pusillum (Figures 4G to 4I), reflecting the >1.5-fold greater C value of barley. Both rt (Figure 4B) and in (Figure 4C) hybridizations against barley, compared with the 4′6-diamidino-2-phenylindole (DAPI)–stained controls (Figures 4A, 4D, and 4G), show the uniform and dispersed hybridization pattern observed earlier for in (Suoniemi et al., 1996a), whereby the hybridization sites are observed along the whole chromosomes except at the centromeres, telomeres, and nucleolar organizers. The rt and in signals, however, are less uniformly distributed in barley than in the other species (cf. Figures 4B and 4E; Figures 4C and 4F and 4I). This may reflect the higher copy number in barley, the fact that distal insertion sites are apparently preferred, and the tendency of additional retrotransposon copies to insert into existing ones (SanMiguel et al., 1996; Noma et al., 1997; Suoniemi et al., 1997). For both probes, the signal intensity of the in situ hybridizations mirrored the copy number estimates from the slot blot hybridizations, being considerably stronger in barley. As with the slot blots, this is not likely to be due to divergence of target sequences outside of barley because the stringency of hybridization and washing was ∼77%. Furthermore, the in signal (Figure 4F versus 4I) is somewhat stronger, and the rt signal (Figure 4E versus 4H) is considerably stronger, in H. euclaston than in H. pusillum, in parallel with the copy number estimates for these species.
Copy Number of BARE-1 Components in Genomic BAC Clones of H. vulgare cv Morex
Barley Genomic BAC Clones Contain Solo LTRs.
(A) BamHI restriction digests of 20 BAC clones.
(B) BamHI digest hybridized with an LTR probe.
(C) BamHI digest hybridized with an in probe.
Length markers indicated at left in (A) to (C) are in kilobases.
The Contribution of BARE-1 to Genome Size in the Genus Hordeum
Plots of BARE-1 copy number against basic genome size among the Hordeum spp accessions, shown in Figure 5, have a positive and significant correlation for both the rt (P = 0.01; Figure 5A) and in (P = 0.03; Figure 5B) probes, linking BARE-1 copy number to genome size. The barley accessions cluster at the high end of both the genome size and copy number distributions. Of the outliers below the distribution, three (samples 16, 17, and 18) are the diploids with the smallest genomes, respectively, H. euclaston, H. pusillum, and H. brachyanterum, and the other (sample 26) is the tetraploid H. depressum. These diploids showed fewer complete BARE-1 elements for their genome size, whereas the tetraploid had a relatively larger basic genome for the BARE-1 copy number. When the tetraploids are left out of the analysis, the regressions become both more predictive and more significant.
Strength of in Situ Hybridization of BARE-1a Probes to Hordeum spp Chromosomes Is Correlated with BARE-1 Copy Number.
(A) to (C) show barley (H. vulgare) chromosomes. (D) to (F) show H. euclaston chromosomes. (G) to (I) show H. pusillum chromosomes.
(A), (D), and (G) Blue, DAPI fluorescence from total DNA.
(B), (E), and (H) Hybridization to an rt probe.
(C), (F), and (I) Hybridization to an in probe.
Probe hybridization sites were detected by rhodamine (red) and fluorescein (green) fluorescence. Bar in (I) = 10 μm for (A) to (I).
Based on these copy number estimates, the genome sizes previously determined for the same Hordeum spp accessions (Kankaanpää et al., 1996), and the observation that the BARE-1 elements in each species have a length similar to BARE-1a (8932 bp), the part of the genome comprising BARE-1 was calculated (Table 1). The complement of the genome made up of full-length BARE-1 was quite variable across the genus, constituting from 0.8 to 5.7% of the genome in the species examined by using the rt probe and 1.1 to 4.8% in the species examined by using the in probe. The BARE-1 family contributed significantly less (P = 0.002) to the genomes of the H genome diploids than to barley, as determined by using the in probe, respectively comprising on average 2.1 and 3.5%. The same distinction was seen with the rt probe (2.9 vs. 3.7%), although it was not statistically significant. In the tetraploid H. bulbosum and H. murinum subspp, BARE-1 formed significantly less of the genome (0.8 to 1.9%) than in the diploids as a whole, although this was not the case for the H genome tetraploids H. depressum and H. jubatum.
Genome Size Is Correlated with BARE-1 Copy Number.
(A) Copy number determined with rt probe versus genome size.
(B) Copy number determined with in probe versus genome size. The graphed numbers correspond to the sample numbers given in Table 1.
To assess the contribution of BARE-1 to genome size growth in the genus Hordeum it would be necessary to make comparisons to the primitive state of the genome of the last common ancestor of the species in the genus. This ancestor is unknown, and the relative rates and effects of genome growth and shrinkage cannot be estimated a priori. Nevertheless, the H. pusillum genome, whether its features are in fact primitive or derived, is useful for comparison with the other species because it has the smallest genome analyzed, the fewest BARE-1 copies, and the smallest percentage of the genome occupied by BARE-1 (1.6%) in the genus (Table 1). On the basis of H. pusillum, a marginal %C of the genome has been calculated from the rt data for each accession (Table 1). This marginal share represents the fraction of the difference in total genome size that can be accounted for by the difference in the number of BARE-1 copies. By using this method, a remarkable 59% of the difference in genome size between H. pusillum and H. euclaston, both similar H genome species of the section of the genus called Anisolepis (von Bothmer et al., 1995), can be accounted for by DNA of BARE-1. In another H genome species, H. brachyanterum, BARE-1 contributed 21% of the calculated marginal difference in C value, but in H. muticum this was only 1.6%. The smallest differences calculated were for two tetraploid subspecies of H. murinum and tetraploid H. bulbosum.
An Excess of LTRs Is Negatively Correlated with BARE-1 Abundance
Unlike in or rt, the absolute LTR number shows no significant relationship to the basic genome size, as seen in Figure 6A. Nevertheless, for the diploids, the number of LTRs relative to rt or in (Table 1) is inversely correlated (Spearman rank order tests; rt, rS = -0.556, P = 0.006; in, rS = -0.571, P = 0.005) with genome size (Figure 6B). Furthermore, for the diploids, the proportion of the genome occupied by BARE-1 (based on rt) is inversely related to the excess of LTRs over in copies (Figure 6C, rS = -0.528, P = 0.01). The outliers (samples 16, 17, and 18) in the correlation with genome size (Figure 6B) are the three diploid species with the smallest genomes. For a given proportional excess of LTRs, their genomes are otherwise considerably smaller than the general trend would predict. Taking the data together, the more LTRs relative to BARE-1 elements, the smaller the contribution of BARE-1 to the genome.
Genome Size as a Function of LTR Copy Number.
(A) LTR copy number versus genome size.
(B) Ratio of LTR number to in number versus genome size.
(C) Ratio of LTR number to in number versus fraction of the genome occupied by BARE-1 (by rt number).
The graphed numbers correspond to the sample numbers given in Table 1.
BARE-1 Copy Number and Genome Size in H. spontaneum Appear Correlated with Habitat
Analyses were made of the relationship of BARE-1 copy number and genome size to environmental variables in the H. spontaneum populations for which detailed climatic and soil data were available. Genome size showed a high correlation (|rS| > 0.6) with evaporation and alluvium soil type. The LTR number was strongly correlated (|rS| > 0.7) with altitude, annual temperature, January and August temperature, the number of hot and dry days, and alluvium soil type. However, most correlations were insignificant, and the proportion of significant correlations may not be above that expected by chance. A trend is nevertheless clear, with effects being associated with variables of temperature, water availability, and soil type. In multiple regression, the coefficient of determination of in number was high (R2 = 0.874, P = 0.073) and is explained by variables linked to hot, dry desert conditions. Furthermore, genome size appears higher in the desert, that is, it seems to increase with aridity. We interpret the lack of statistical significance at the 0.05 level as an effect of the small sample size.
DISCUSSION
The C value paradox, that variation in genome size (108 to 1011 bp in plants) is not correlated with the complexity of the organism, has long been apparent (Thomas, 1971). Comparisons of fine-structure maps within the family Poaceae (Ahn et al., 1993; Chen et al., 1997) sharpened the paradox by showing, across a >10-fold variation in genome size, that homologous chromosomes of grasses are syntenic, the order of mapped genes being generally preserved against great variation in genome size. Genes in the grasses appear to occur in islands separated by repetitive DNA (Barakat et al., 1997; Panstruga et al., 1998), repetitive DNA comprising >70% of the barley and maize genomes (Flavell et al., 1977; Barakat et al., 1997) and transposons comprising >50% of the maize genome (SanMiguel et al., 1996).
Together, these observations suggest that retrotransposons account for much of the size variation between genomes in the family Poaceae. Retrotransposons are “selfish” (Doolittle and Sapienza 1980; Orgel and Crick, 1980), possessing a self-contained system for replicatively increasing their copy number, which, in the absence of detrimental effects to the host, should lead to greater numbers of increasingly active copies. This and the lack of evidence for an equally efficient mechanism for removing repetitive DNA from the genome have sparked a discussion on the role of retrotransposons in unidirectionally ratcheting genome size upwards (Wessler et al., 1995; Bennetzen and Kellogg, 1997a, 1997b; Petrov, 1997; Voytas and Naylor, 1998).
To examine the prevalence of retrotransposon BARE-1 in the genomes of Hordeum spp, we determined the copy number for its in and rt internal domains and the terminal LTRs throughout the genus by slot and dot blot hybridizations. The data here show that BARE-1 copy number and genome size are positively correlated in the genus Hordeum, in contrast to what has been reported for the genus Vicia from more limited comparisons (Pearce et al., 1996a). Across the genus as a whole, BARE-1 is present on average in 14 ± 1 × 103 copies and in barley at 16.6 ± 0.6 × 103 copies, conserved in length. The in situ hybridization data are consistent with the relative copy numbers calculated from the blotting data, the dispersed BARE-1 elements in the three Hordeum spp examined displaying a relative intensity parallel with the detected copy number.
The data further show that Hordeum spp genomes contain a large excess (7- to 42-fold, mean 16-± 2-fold) of LTRs relative to the internal regions of BARE-1, departing from the expected 2:1 ratio of intact retrotransposons. The genome of barley cultivar Bomi, with one of the smallest excesses of LTRs found, contains >6 × 104 LTRs than can be accounted for by the number of BARE-1 elements. The BAC analyses examined 0.03% of the genome of one barley cultivar in detail; the dot and slot blot hybridizations surveyed the entire genomes of the accessions. The data taken together confirm the existence of solo LTRs in barley specifically and strongly suggest that excess LTRs detected in the other Hordeum spp are solo LTRs as well. Examples of solo LTRs have been found in mammals (Banville et al., 1992; Blusch et al., 1997), Drosophila (Geyer et al., 1988), yeast (Parket et al., 1995; Kim et al., 1998), and plants (Sentry and Smyth, 1989; SanMiguel et al., 1996; Noma et al., 1997; Bevan et al., 1998). However, an abundance such as that seen in the genus Hordeum has not been reported previously.
Homologous recombination between the LTRs of a single BARE-1 element would remove the internal domain and leave behind a single recombinant LTR flanked by the direct repeats remaining from the original insertion site. Hence, the strongest support for the existence of solo LTRs and their derivation from formerly intact BARE-1 elements would come from sequences of large contiguous genomic regions. Indeed, analysis of a fully sequenced contiguous stretch of 66 kb from chromosome 2H of cultivar Ingrid (K. Shirasu, A. Schulman, T. Lahaye, and P. Schulze-Lefert, manuscript in preparation) shows that this region contains four BARE-1 units, derived from five retrotransposons. Of these units, two are solo LTRs that have resulted from recombination in a single element, and one consists of a pair of solo LTRs resulting from recombination within one BARE-1 nested inside another. The DNA gel blots as well as the PCRs show that these solo LTRs are present as such in the genome and are not a result of bacterial recombination in the BAC.
Intrachromosomal homologous recombination has been demonstrated and analyzed in dicotyledonous plants with integrated model substrates (reviewed in Puchta and Hohn, 1996). The recombination is thought to proceed by either double-strand break repair or single-strand annealing at rates of 10-5 to 10-6 homologous recombination events per cell division (Puchta and Hohn, 1996). The rate is additive for transgenes in allelic positions, and it is dependent on the plant species, configuration, and genomic position of the recombination substrate, plant organ, and particular progeny (Puchta and Hohn, 1996), although perhaps not on the degree of methylation at CG and CXG nucleotide motifs (Puchta et al., 1992). Intrachromosomal recombination between repetitive sequences is thought to be infrequent, limiting the loss of intervening genes (Hu et al., 1998). However, double-stranded breaks can efficiently induce intrachromosomal recombination between flanking homologous sequences in maize (Athma and Peterson, 1991) and yeast (Parket et al., 1995). Whereas no direct data are available, the degree of polymorphism seen with an anchored PCR method (Waugh et al., 1997) indicates that the BARE-1 insertion frequency is in the range seen for intrachromosomal recombination, <4 × 10-5 events per element per generation.
The ratio between the number of LTRs and full-length elements may reflect the balance of the relative rates of BARE-1 propagation and inter-LTR recombination. These rates need not be either constant or consonant; the maize genome has apparently experienced an explosive increase in retrotransposon numbers in at least part of the genome over the last 3 million years (SanMiguel et al., 1998), although only two solo LTRs were found in a region of abundant, nested retrotransposons (SanMiguel et al., 1996). Recombination between LTRs would be expected to reduce the complement of functional retrotransposons in the genome, limiting but not eliminating the contribution of the BARE-1 family to a genome size increase. Consistent with this idea, the excess of LTRs observed relative to the BARE-1 numbers increases as both the genome size and the portion of the genome occupied by BARE-1 decrease. Selective pressure for a small genome in the presence of countervailing activity by BARE-1 would lead to accumulation of excess of LTRs through recombination. The yeast genome, which is thought to be under pressure for compactness, represents an example of the accumulation of solo LTRs, containing 51 full-length Ty retrotransposons but 280 solo LTRs or LTR fragments (Kim et al., 1998), with high variability in the number per chromosome.
The opposite selective pressures may be found under hot, dry desert conditions, which multivariate analysis shows are strongly and significantly correlated with increasing genome size and BARE-1 copy number. The data suggest that increases in both genome size and genetic polymorphism (Nevo et al., 1979; Nevo and Beiles, 1988) in dry environments might be adaptive in the genus Hordeum as it is in the pines (Wakamiya et al., 1996) and concomitantly associated with either propagation of BARE-1 or inheritance of new copies. In this regard, transcription or transposition of various retrotransposons has been shown to be linked to biotic and abiotic stresses (Wessler, 1996; Grandbastien, 1998; Takeda et al., 1998).
In summary, recombination may provide a partial “return ticket from genomic obesity” (Bennetzen and Kellogg, 1997a), at least from the genome bloat due to retrotransposons, providing that the frequency relative to retrotransposition is sufficiently high. Although accumulation of the 1.8-kb solo BARE-1 LTRs that result from recombination would still contribute to genome size increase, additional recombination between nearby LTRs could limit it. The tendency of retrotransposons to insert into other retrotransposons, at least in grass genomes (SanMiguel et al., 1996; Noma et al., 1997; Suoniemi et al., 1997), and into regions of microsatellites (Kalendar et al., 1999; Ramsay et al., 1999) would simultaneously provide an additional means of inactivating retrotransposons and reduce the risk of gene deletion through the recombinational loss of genomic DNA intervening between solo LTRs.
The overall importance to genome evolution in the genus Hordeum of BARE-1 propagation and loss depends on the number and dynamics of the other families of retrotransposons and repetitive sequences in the genome, the latter known to undergo rapid change in several grass genomes (Saghai Maroof et al., 1990; Ceccarelli et al., 1992; Hueros et al., 1993; SanMiguel et al., 1998). The three smallest genomes among the accessions examined all have smaller genomes than is typical for their BARE-1 copy number and relative LTR excess. This suggests that these genomes have been under strong downward selection for size and that other forms of repetitive DNA contributing to the total C value have been depleted more successfully than has BARE-1. Conversely, compared with the smallest (not necessarily representing the ancestral) genome measured in the genus, BARE-1 contributes an average of 8 ± 2% and a range of 0.3 to 59% to the difference between the size of this genome and others in the genus. The magnitude of this range and the variation in LTR abundance suggest that the amount of activity and impact of BARE-1 on the genome has been highly lineage-specific. The selective pressures and mechanisms modulating this “genome war” between the cell and the retrotransposon remain to be clarified.
METHODS
Plant Materials
Sources and provenances of the Hordeum spp accessions are detailed elsewhere (Kankaanpää et al., 1996). They comprised 27 accessions and 17 species, including nine accessions of H. spontaneum (wild barley), an Indian and an African landrace of cultivated barley (H. vulgare), and one commercially bred barley. Seeds were germinated and then grown in a controlled environment chamber as described previously (Suoniemi et al., 1996b). For in situ hybridizations, seeds derived from a cross between barley cultivars Alexis and Regata were used.
DNA Preparation
DNA was isolated from leaves, essentially as detailed before (Manninen and Schulman, 1993), and then treated with RNase ONE ribonuclease (Promega, Madison, WI) followed by phenol and chloroform extractions and ethanol precipitation. Total genomic and plasmid DNA was quantified both spectrophotometrically and after agarose gel electrophoresis as ethidium bromide fluorescence under UV light. Polaroid type 665 negatives were scanned, and densitometry was performed on the scanned image using Tina 2.09 software (Isotopenmessgeräte, Straubenhardt, Germany) against a known amount of a PstI fragment from λ phage DNA. All plasmids were cloned in Escherichia coli JM109.
Polymerase Chain Reactions and Sequencing
The primers used for the amplification by polymerase chain reaction (PCR) of the BARE-1 internal domain were previously described (Suoniemi et al., 1997) and corresponded to bases 1685 to 1704 in the left long terminal repeat (LTR) and 7214 to 7235 in the right LTR. The PCR reactions were performed using the Expand Long Template PCR System (Roche Molecular Biochemicals, Mannheim, Germany) with buffer 1, as described by the supplier, using 10 ng genomic DNA, 0.2 mM each dNTP, and 1 pmol μL-1 each primer in a final volume of 50 μL. The mix was overlaid with paraffin oil. The reaction mixtures were heated to 95°C for 5 min; and then subjected to seven cycles of 94°C for 30 sec, 43°C for 2 min, a ramp of +1°C (2 sec)-1 to 72°C, and 72°C for 4 min. This was followed by 41 cycles of 94°C for 30 sec, 60°C for 2 min, a ramp of +1°C (2 sec)-1, and 72°C for 4 min. The reaction was completed by a 10-min incubation at 72°C. Reactions were performed in a Minicycler (MJ Research, Waltham, MA) thermal cycler.
For amplification of genomic LTRs, primers at the ends of the LTRs were used (N referring to equal amounts of A, T, G, and C in the primer preparation at that position): forward, 5 ′-NNTGTTGGAATTATGCCCTAGAGGCAA-3′ (GenBank accession number Z17327, bases 309 to 333); reverse, 5′-NNTGTGGGGAACGTCGCATGGGAAAC-3′ (GenBank accession number Z17327, bases 2113 to 2137). The reaction conditions were as for the internal domain (above) except that the extension time was 2 min at 72°C. For cloning and sequencing LTRs of various species for sequence comparisons, we used the same set of primers as previously. The PCR mix was the same as used previously, but the reaction cycles were different. The reaction mixtures were first heated to 95°C for 5 min, followed by 21 cycles of 94°C for 30 sec, 40°C for 2 min, and 72°C for 2.5 min. Reactions were completed with one incubation at 72°C for 10 min. The PCR products were purified from agarose gels (QIAEX II; Qiagen, Hilden, Germany) and cloned (pGEM-T vector system; Promega). Sequencing reactions on plasmid minipreps were performed with Sequenase v2.0 (Amersham Pharmacia Biotech, Uppsala, Sweden) and analyzed under standard conditions with an automated system (ALF; Amersham Pharmacia Biotech). Sequences were aligned and compared as previously described (Suoniemi et al., 1998a).
Slot and Dot Blots
Slot blots were prepared by filtration through a vacuum manifold (Hoeffer PR600; Amersham Pharmacia Biotech), applying 10 to 100 ng of sample genomic DNA with herring sperm DNA used to maintain a constant DNA load. The DNA was cross-linked to filters under UV light. Herring sperm DNA at 100 ng per well also served as a negative control. Isolated plasmids (0.1 to 10 ng per well) containing the fragment used for hybridization probes served as positive controls on each filter. The LTR probe (NheI-BstEII, 743 bp) was from the untranscribed region. The other probes were for in (HpaI-BsmI, 589 bp) and rt (SalI-StyI, 703 bp). All the probes were random-primed (Rediprime or Megaprime; Amersham Pharmacia Biotech) and 32P-labeled. Filters were hybridized in 50% formamide, 1.25 × SSPE (1 × SSPE is 58.8 mM Na2HPO4, 61.2 mM NaH2PO4, 0.6 M NaCl, 60 mM sodium citrate, pH 6.8), 5 × Denhardt’s solution (1 × Denhardt’s solution is 0.02% [w/v] Ficoll 400, 0.02% [w/v] polyvinylpyrrolidone averaging 360 kD, 0.2% BSA), 0.5% SDS, and 20 μg mL-1 herring sperm DNA overnight at 42°C. Hybridized filters were washed successively with 2 × SSC (1 × SSC is 0.15 M NaCl and 0.015 sodium citrate), 0.1% SDS (10 min, 25°C), twice in 2 × SSC, 0.1% SDS (10 min, 65°C), and once in 0.2 × SSC (20 min, 65°C). Bound radiation was quantified by exposure of an imaging plate for 2 to 16 hr followed by scanning on either a PhosphorImager (model SI; Molecular Dynamics, Sunnyvale, CA) or a BAS PhosphoImager (model 1500; Fuji Photo Film, Tokyo, Japan).
Isolation and Characterization of Bacterial Artificial Chromosome Clones
A barley genomic bacterial artificial chromosome (BAC) library constructed from barley cultivar Morex (Y. Yu, J. Tompkins, D. Frisch, R. Waugh, R. Brueggeman, D. Kudrna, A. Kleinhofs, and R. Wing, personal communication) was screened using a full-length LTR probe. Twenty positive clones were selected, and BAC DNA was prepared by standard methods (Sambrook et al., 1989). The BAC DNA fragments were separated on agarose gels, denatured, and transferred to a Hybond N+ (Amersham Pharmacia Biotech) membrane using standard methods (Sambrook et al., 1989). Hybridization conditions were as for the slot blot hybridizations. The hybridized filters were washed twice in 2 × SSC, 0.1% SDS for 10 min at 25°C and twice in 2 × SSC, 0.1% SDS for 10 min at 65°C. The sizes of the BAC clone inserts were estimated on the basis of the restriction fragments obtained in restriction digests.
Copy Number Calculations
Genomic copy number was calculated from the hybridization response of the genomic DNA compared with the control plasmids on the blots as follows: copies ng-1 = genomic PSL ng-1 × plasmid copies × plasmid PSL-1, where PSL stands for photo-stimulated luminescence units, the output unit for exposure of the PhosphorImager screens. The copies ng-1 were converted to copies genome-1 using the genome size data available for the same accessions of Hordeum spp (Kankaanpää et al., 1996). For each sample, the mean response of three to six hybridizations on at least three separate filters was used to determine copy number.
For the slot blots, corrections were made for variations in sample binding and accessibility by hybridization response of each filter and sample to a labeled total DNA probe. For this purpose, the raw copy number was normalized relative to barley based on the samples’ hybridization response to random-primed labeled total cultivar Bomi DNA. Sample 12 (Table 1) was present on every filter. All the blots were hybridized together so that the probe concentration was identical for all filters. The control hybridizations were washed under conditions 18°C below the melting temperature of 83°C, calculated for barley DNA on the basis of 0.043 M Na+1 in the washing solution and an average 59.1% G + C nucleotide content in barley coding DNA (Nakamura et al., 1997).
A set of dot blot hybridizations with sample 12 was used to determine absolute BARE-1 copy number in this accession and thereby convert the relative copy numbers in the other accessions to absolute numbers. The DNA of cultivar Bomi was also present as a control on these dot blot filters. Copy number could not be determined by direct comparison of the hybridization response of the control plasmids and genomic DNA against the labeled total DNA probe because the plasmid control spots saturated the available BARE-1 fragments in the genomic probe, giving gross underestimations of the amount of bound control plasmid DNA.
In addition, this method was assessed with a set of control dot blot hybridizations in which the accession DNA was spiked with λ phage DNA before individual samples were pipetted. For each microgram of genomic plant DNA, 139.4 ng λ phage DNA was added, giving 1.2 × 104 copies of λ phage per barley genome equivalent. Dot blots were prepared with multiple replicates and both 1 and 10 ng of genomic DNA per sample. The same filter was probed in series with integrase (in), LTR, and λ phage probes as above for the slot blots. Hybridization response to the in and LTR probes was corrected to the average value for the λ phage DNA hybridization response and copy number calculated as above.
In Situ Hybridization
The clones used for probes were as described above for slot blots. Probes were labeled by nick translation with either rhodamine-4-dUTP (Amersham Pharmacia Biotech) or biotin-16-dUTP (Roche Molecular Biochemicals), following a protocol described previously (Anamthawat-Jónsson et al., 1996). Root tips were collected from germinating Hordeum spp seedlings when the roots were 1 to 2 cm long and then treated in ice water for 25 hr before fixation in 3:1 absolute ethanol:glacial acetic acid. The fixed root tips were digested with cellulase and pectinase, after which chromosome preparations were made either by the squash method (Schwarzacher and Leitch, 1994) or by protoplast dropping (Busch et al., 1996).
Chromosomes were treated and in situ hybridization was performed according to a standard procedure (Anamthawat-Jónsson et al., 1996). Hybridizations were performed at a stringency of 77% in 50% formamide, 2 × SSC at 37°C overnight and then washed at a stringency of 76% in 40% formamide, 2 × SSC at 42°C for 10 min. Each experiment (DNA probe and species) was repeated two to five times with independently prepared probes and chromosome preparations. Rhodamine-labeled probes were detected directly, whereas biotin-labeled probes were detected with ExtrAvidin-FITC (Sigma). The hybridization signal was visualized and photographed with epifluorescence microscopy, with >100 cells per preparation examined to locate representative chromosome sets.
Statistical Analyses
Copy Number
The DNA size estimations were made with TableCurve V. 2.02 (SPSS Science, Chicago, IL). Linear regression analyses for copy number data were made with SigmaPlot version 4.0 (SPSS Science). Parametric correlation analyses were made with the Pearson product moment test (yielding rP and P) and nonparametric tests with the Spearman rank order correlation tests (rS and P) as implemented in SigmaStat version 2.0 (SPSS Science). P values in the text represent the likelihood of falsely rejecting the null hypothesis that the variables are not correlated. The variables rP and rS are the correlation coefficients for the Pearson and Spearman tests, respectively. Values in the text are expressed as means and standard errors (se).
Correlations among Environmental Factors, Genome Size, and BARE-1 Copy Number
Spearman rank correlations (SAS Institute, 1996) were made between geographic, climatic, and edaphic factors and the genome size, rt, in, and LTR copy number and LTR:in ratio for seven Israeli populations of H. spontaneum. Only significant correlations were corrected by the Bonferroni test for the number of correlations that were attempted. Sample 3, accession number SCI 77-1 from the Upper Gallilee, was not included in this test due to missing ecological data. Ecogeographical data and sites are detailed elsewhere (Table 1 and Figure 1 in Nevo et al., 1979). We considered environmental variables in the following categories: geographical, including longitude, latitude, and altitude; climatic means, including annual, January, and August temperature, seasonal temperature difference, daily temperature difference, Sharav or the number of hot and dry days, the number of tropical days, and evaporation; and moisture conditions, including annual rainfall, the number of rainy days, the number of dewy nights in summer, annual humidity, humidity at 14:00 hours, and the Thornthwaite moisture index. We also considered four edaphic dummy variables, one for each soil type: alluvium, loess, sandy loam, and terra rossa.
Multiple Regression Analysis
A test of the best predictors of C, rt, in, LTR, and LTR/in of seven Israeli populations of H. spontaneum was conducted by stepwise multiple regression analysis (SAS Institute, 1996) using these characteristics as dependent variables and geographic, climatic, and edaphic factors as independent variables. The environmental variables we used included those that were used in the Spearman rank correlation, excluding three variables: Sharav, number of dewy nights in summer, and terra rossa soil type.
Acknowledgments
We thank Anne-Mari Narvanto for her continuing excellent technical assistance. We are grateful to Arnis Druka and Andris Kleinhofs (Department of Crop and Soil Sciences, Washington State University, Pullman, WA) for screening and donating BAC clones for further analysis. The research reported here was supported by grants from the Academy of Finland Genome Research Program and the European Union Directorate for Biotechnology research program on Molecular Tools for Biodiversity. E.N. thanks the Israel Discount Bank Chair of Evolutionary Biology and the Ancell-Teicher Research Foundation for Genetics and Molecular Evolution for financial support.
Footnotes
- Received April 16, 1999.
- Accepted June 2, 1999.
- Published September 1, 1999.