Plant Cell Journal of Pharmacology and Experimental Therapeutics
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


First published online October 19, 2004; 10.1105/tpc.104.025502

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
16/11/2870    most recent
tpc.104.025502v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (42)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kuang, H.
Right arrow Articles by Michelmore, R. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kuang, H.
Right arrow Articles by Michelmore, R. W.
Agricola
Right arrow Articles by Kuang, H.
Right arrow Articles by Michelmore, R. W.
The Plant Cell 16:2870-2894 (2004)
© 2004 American Society of Plant Biologists

Multiple Genetic Processes Result in Heterogeneous Rates of Evolution within the Major Cluster Disease Resistance Genes in Lettuce{boxw}

Hanhui Kuanga, Sung-Sick Wooa,1, Blake C. Meyersa,2, Eviatar Nevob and Richard W. Michelmorea,3

a Department of Vegetable Crops, University of California, Davis, California 95616
b Institute of Evolution, University of Haifa, Mount Carmel, 31905 Israel

3 To whom correspondence should be addressed. E-mail rwmichelmore{at}ucdavis.edu; fax 530-752-9659.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 METHODS
 REFERENCES
 
Resistance Gene Candidate2 (RGC2) genes belong to a large, highly duplicated family of nucleotide binding site–leucine rich repeat (NBS-LRR) encoding disease resistance genes located at a single locus in lettuce (Lactuca sativa). To investigate the genetic events occurring during the evolution of this locus, ~1.5- to 2-kb 3' fragments of 126 RGC2 genes from seven genotypes were sequenced from three species of Lactuca, and 107 additional RGC2 sequences were obtained from 40 wild accessions of Lactuca spp. The copy number of RGC2 genes varied from 12 to 32 per genome in the seven genotypes studied extensively. LRR number varied from 40 to 47; most of this variation had resulted from 13 events duplicating two to five LRRs because of unequal crossing-over within or between RGC2 genes at one of two recombination hot spots. Two types of RGC2 genes (Type I and Type II) were initially distinguished based on the pattern of sequence identities between their 3' regions. The existence of two types of RGC2 genes was further supported by intron similarities, the frequency of sequence exchange, and their prevalence in natural populations. Type I genes are extensive chimeras caused by frequent sequence exchanges. Frequent sequence exchanges between Type I genes homogenized intron sequences, but not coding sequences, and obscured allelic/orthologous relationships. Sequencing of Type I genes from additional wild accessions confirmed the high frequency of sequence exchange and the presence of numerous chimeric RGC2 genes in nature. Unlike Type I genes, Type II genes exhibited infrequent sequence exchange between paralogous sequences. Type II genes from different genotype/species within the genus Lactuca showed obvious allelic/orthologous relationships. Trans-specific polymorphism was observed for different groups of orthologs, suggesting balancing selection. Unequal crossover, insertion/deletion, and point mutation events were distributed unequally through the gene. Different evolutionary forces have impacted different parts of the LRR.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 METHODS
 REFERENCES
 
The majority of resistance genes (R-genes) in plants cloned so far encode nucleotide binding site–leucine rich repeat (NBS-LRR) proteins. The LRR region is hypothesized to form a series of ß-sheets with solvent-exposed residues available for interaction with a variety of ligands (Jones and Jones, 1997Go). This is supported by significantly elevated nonsynonymous:synonymous nucleotide substitution ratios that are indicative of diversifying selection acting on the putative solvent-exposed residues for many of the resistance genes cloned (e.g., Parniske et al., 1997Go; McDowell et al., 1998Go; Meyers et al., 1998bGo; reviewed in Michelmore and Meyers, 1998Go). However, direct physical interaction between a NBS-LRR resistance protein and an avirulence gene product has only rarely been demonstrated (Jia et al., 2000Go). This in part led to the guard hypothesis in which NBS-LRR proteins function as members of multiprotein complexes and detect the binding of the avirulence protein to one or more members of the complex (Dangl and Jones, 2001Go).

Many resistance genes are clustered in plant genomes (reviewed in Michelmore and Meyers, 1998Go; Hulbert et al., 2001Go; Richly et al., 2002Go; Zhu et al., 2002Go; Meyers et al., 2003Go). These clusters often comprise tandem arrays of genes that determine resistance to multiple pathogens as well as to multiple variants of a single pathogen. The genes may be organized as tight clusters with little intervening sequence, for example, the RPP5 cluster in Arabidopsis thaliana spans 91 kb (Noël et al., 1999Go), or be spread over several megabases as is the Resistance Gene Candidate2 (RGC2) locus in lettuce (Lactuca sativa) (Meyers et al., 1998aGo). The functional and evolutionary significance of this clustered arrangement is unclear.

A variety of genetic events have been documented at clusters of resistance genes, but before our current study their relative importance to the evolution of new resistance specificities was poorly understood. Extensive sequence exchanges between paralogs within clusters of R-genes have been detected in tomato (Lycopersicon esculentum), lettuce, Arabidopsis, flax (Linum usitatissimum), rice (Oryza sativa), and maize (Zea mays) (Parniske et al., 1997Go; Song et al., 1997Go; McDowell et al., 1998Go; Meyers et al., 1998aGo; Caicedo et al., 1999Go; Ellis et al., 1999Go; Noël et al., 1999Go; Cooley et al., 2000Go; Dodds et al., 2001aGo; Van der Hoorn et al., 2001Go). Genetic analyses showed that recombination played a central role in the creation of genetic diversity at the Rp1 rust resistance complex of maize and could create genes with novel specificities (Hulbert, 1997Go). Spontaneous Rp1 mutants had deletions at this locus caused by unequal crossing-over between paralogs (Collins et al., 1999Go; Sun et al., 2001Go). Extensive work with the L, M, N, and P loci in flax demonstrated the role of recombination in the evolution of new specificities (Ellis et al., 1999Go; Luck et al., 2000Go; Dodds et al., 2001aGo, 2001bGo). Unequal crossing-over and gene conversion were the causes of spontaneous Dm3 mutants in lettuce (Chin et al., 2001Go). Recombination has also been important in the evolution of Cf-4/9 homologs in tomato (Parniske et al., 1997Go; Parniske and Jones, 1999Go). Because of the instability of some resistance loci, particularly the Rp1 genes, resistance genes were initially assumed to evolve rapidly in response to changes in the pathogen population.

However, molecular analysis of orthologs and paralogs of Pto in tomato and RGC2 paralogs in lettuce demonstrated that at least some resistance genes were evolving slowly (Michelmore, 1999Go). This led to the development of a birth-and-death model for resistance gene evolution, similar to that proposed for multigene families of the vertebrate immune system, in which gene duplication and unequal crossing-over followed by diversifying selection results in varying numbers of semi-independently evolving lineages of resistance genes (Nei et al., 1997Go; Michelmore and Meyers, 1998Go; Piontkivska and Nei, 2003Go). The recognition of the avirulence gene avrRpm1 predated the divergence of A. thaliana from A. lyrata (Stahl et al., 1999Go). Analysis of nucleotide polymorphisms flanking the RPM1 resistance gene in Arabidopsis spp subsequently indicated the long-term coexistence of resistance and susceptibility alleles consistent with the slow evolution of this single gene resistance locus (Stahl et al., 1999Go). No homoplasies were observed among RPS2 genes from 17 accessions of A. thaliana (Caicedo et al., 1999Go). Further support for the slow evolution of resistance genes comes from the ancient origin of pathogen recognition specificity; recognition of avrPto predated the divergence of Lycopersicon pimpinellifolium from L. hirsutum var glabratum (Riely and Martin, 2001Go).

Lettuce is an inbreeding cultivated species (2x = 2n = 18) in the Compositae family that is fully fertile with its likely progenitor L. serriola, a common weedy colonizer of disturbed habitats (Kesseli et al., 1991Go). L. serriola, L. saligna, and L. virosa exhibit increasing sexual incompatibility with L. sativa and represent the major wild species in the gene pool for cultivated lettuce. The center of diversity for these species is the Eastern Mediterranean (H. Kuang, E. Nevo, and R. Michelmore, unpublished data). L. serriola and to a lesser extent L. saligna have been important sources for the introgression of disease resistance, particularly to downy mildew (Bremia lactucae), into L. sativa (Crute, 1992Go).

The RGC2 locus in lettuce is one of the largest clusters of resistance gene candidates so far characterized in plants. At least eight Dm genes for resistance to downy mildew as well as a gene for resistance to root aphid have been mapped to this region (Kesseli et al., 1994Go). One RGC2 family member, Dm3 (RGC2B), is necessary and sufficient to confer resistance to isolates of B. lactucae that express the Avr3 avirulence gene (Meyers et al., 1998aGo; Shen et al., 2002Go). Dm3 is 13 kb long and encodes a large coiled-coil type NBS-LRR protein with ~42 LRRs. More than 24 RGC2 genes (Dm3 paralogs) had been detected at this locus that covers at least 3 Mb in the cultivar Diana (Meyers et al., 1998aGo) but spanned <0.1 centimorgan in the crosses studied (Chin et al., 2001Go). The majority of these RGC2 genes are transcribed (Shen et al., 2002Go). Sequence exchange was detected between RGC2 paralogs and there has been significant diversifying selection on the putative solvent-exposed residues within the LRR region (Meyers et al., 1998bGo). Spontaneous losses of Dm3 specificity were shown to be attributable to deletions involving multiple RGC2 paralogs resulting from unequal crossing-over and, in one case, as the result of a gene conversion event in the 3' end of Dm3 (Chin et al., 2001Go). Analysis of RGC2 diversity in wild and cultivated germplasm as well as within and between natural populations of L. serriola from Israel and California identified large numbers of haplotypes, indicating the presence of numerous resistance genes in L. serriola, L. saligna, and L. virosa (Sicard et al., 1999Go). However, there was little data on sequence diversity or the mechanisms generating this diversity.

To gain a detailed understanding of the genetic events occurring at the RGC2 locus in lettuce and their significance to the evolution of resistance gene candidates, we compared a large number of RGC2 sequences encoding fragments of the LRR region from seven genotypes of three Lactuca spp. This demonstrated that there are two distinct types of RGC2 genes (Type I and Type II) that differed in their patterns of sequence divergence. Frequent gene conversion events in the 3' half of Type I genes have generated a large variety of RGC2 genes, homogenized intron sequences, and confounded allelic/orthologous relationships between RGC2 genes in different genotypes and species. By contrast, Type II genes evolved slowly and maintained obvious allelic/orthologous relationships in different genotypes and species. Different genetic events have occurred with different frequencies in Type I and Type II genes.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 METHODS
 REFERENCES
 
Characterization of RGC2 Genes from Genomic Libraries of cv Diana
RGC2 genes are members of a large cluster of paralogs that had been partially characterized previously; nine RGC2 genes had been completely sequenced and another 13 genes partially sequenced (Meyers et al., 1998aGo, 1998bGo). In this study, we sequenced additional RGC2 genes resulting in 29 to 31 completely or partially sequenced RGC2 genes from cv Diana (Figure 1). This included two new genes in cv Diana identified by sequencing of PCR amplification products (TDAE and TDAG; see Methods). The precise number is unclear because TDT or TDV could be from the same gene as TDAC, TDAE, or TDAG.



View larger version (16K):
[in this window]
[in a new window]
 
Figure 1. Sequenced RGC2 Genes from Diana and Positions of Oligonucleotide Primers Used for Amplification Using PCR.

(A) Structure of RGC2 genes. The asterisk indicates the position of microsatellite MSATE6. The black bar below the gene indicates the region studied extensively in this article (Figure 1C).

(B) Regions of RGC2 paralogs sequenced from cv Diana. Pseudogenes are marked with an "x" in the third to last column. The last column shows the physical position of each RGC2 gene. TDL and TDF may be located at either end of the cluster. The position of the last six genes has not been determined. Small gaps in the sequences, mainly in introns, are indicated by "{Delta}."

(C) Positions of primers used to amplify 3' RGC2 fragments.

 
Three fragments in the MSATE6 profile of cv Diana (see below; Figure 2) were not present in the sequenced RGC2 genes. Therefore, there are at least 32 RGC2 genes in the major cluster resistance genes in cv Diana. The RGC2 genes are some of the largest genes known in plants, up to ~15 kb with eight exons and seven introns and ~5.8 kb of coding region. For all ~30 sequenced RGC2 genes in Diana, the size of exons was highly conserved except for a few genes with large deletions and insertions (see below). The position of the introns was completely conserved; however, intron size varied greatly, particularly intron 3, which ranged from 363 to >6 kb.



View larger version (105K):
[in this window]
[in a new window]
 
Figure 2. Microsatellite MSATE6 Profiles of Lettuce Genomic DNA.

(A) MSATE6 profile of the seven genotypes sampled intensively in this study to estimate the minimum RGC2 copy number.

(B) MSATE6 profile of 10 individuals of L. serriola accession W66336A demonstrating the segregation of two RGC2 haplotypes. The last lane is a pool of ~20 individuals of W66336A.

 
RGC2 genes encode NBS-LRR proteins of the non-TIR class of resistance genes. The N-terminal has ~170 amino acids of unknown function, followed by an ~300–amino acid putative NBS, followed by an ~1300–amino acid LRR region (Meyers et al., 1998bGo). The number of LRRs with a consensus LxxLxxLxLxxCxx motif (where L refers to Leu or other aliphatic amino acid and x refers to any amino acid) varied from 40 to 47. Of the 29 to 31 RGC2 genes in cv Diana, at least 12 were apparently pseudogenes because they contained frameshift or nonsense mutations (Figure 1B). The overall nucleotide identities of RGC2 paralogs in Diana varied from 74 to 99% (mean = 80.4%). The most divergent genes included TDW and TDAH, which had frameshift mutations and therefore are probably pseudogenes. Sequence identity in the NBS region varied from 69.8 to 99.5% (mean = 77.7%) and 28 out of 392 pairwise nucleotide identities exceeded 95%. The mean nucleotide identity in the LRR region (82.4%) was slightly higher than that in the NBS and varied from 70.0 to 99.1% and varied over a similar range; however, only 1 out of 406 pairwise nucleotide identities exceeded 95%.

Isolation of RGC2 Fragments from Six Additional Genotypes of Lactuca spp
To amplify RGC2 fragments from other genotypes, oligonucleotide primers (Ex4b and Ex5r1) were designed based on conserved sites within the 3' LRR region of the RGC2 genes in cv Diana (Figure 1, Table 1). This primer combination amplified RGC2 fragments from cv Diana with minimal PCR-induced errors (see Methods). Fragments of 1.5 to 2.4 kb from the 3' region of RGC2 genes were amplified from an additional six genotypes of L. sativa (cultivars Mariska and Calmar), L. serriola (W66336A), and L. saligna (US93UC-10, CGN9311, and PI491204). We initially sequenced ~40 clones from each of three independent PCR amplifications from each genotype. More than 90% of these clones contained sequences similar to RGC2 genes. Sequencing of the ~120 clones/genotype represented 12, 10, 15, 34, 12, and 9 different RGC2 genes in Mariska, Calmar, US93UC-10, W66336A, CGN9311, and PI491204, respectively (Table 2). The number of PCR fragments amplified was close to saturated for all genotypes except W66336A because no or very few singletons were detected (Table 2). A total of 197 RGC2-containing clones were then sequenced for W66336A; this identified 39 distinct RGC2 sequences and decreased the number of singletons from 11 to 2. In all, 126 distinct RGC2 sequences were identified from the six genotypes plus Diana.


View this table:
[in this window]
[in a new window]
 
Table 1. Sequences of Oligonucleotide Primers Used in PCR Amplification

 

View this table:
[in this window]
[in a new window]
 
Table 2. The Number of RGC2 Genes in the Seven Genotypes

 
Variation in the Number of RGC2 Genes in Different Genotypes
The number of genes amplified by PCR with primers Ex4b and Ex5r1 from the seven genotypes varied from 9 to 39 (Table 2). To determine whether this was because of differential amplification with primers designed from RGC2 sequences in cv Diana or reflected real variation in the number of RGC2 genes in the genome, copy number was assessed using several additional approaches. DNA gel blot hybridizations to genomic DNA were made using probes of pooled fragments that had been amplified by primers Ex4b and Ex5r1 from 20 diverse RGC2 genes in Diana. The number of bands varied from 11 in PI491204 and CGN9311 to 25 in Diana. There was a strong correlation between the number of bands in DNA gel blots and the estimated minimum number of RGC2 genes (see below) (r = 0.85, P < 0.01; Table 2).

Amplification of the microsatellite marker MSATE6 resulted in 20 distinct fragments from at least 29 of the 32 RGC2 genes in Diana (Figure 2A). The MSATE6 primers would be predicted to amplify from ~90% of the sequences amplified using primers Ex4b and Ex5r1. The actual numbers of MSATE6 fragments amplified from genomic DNA of Mariska, Calmar, US93UC-10, W66336A, CGN9311, and PI491204 were 12, 11, 16, 21, 14, and 9, respectively (Figure 2A). The majority of bands in the MSATE6 profile were represented in the RGC2 sequences amplified by primers Ex4b and Ex5r1; however, a few previously undetected bands were identified (Table 2). There was a strong correlation between the number of MSATE6 fragments and number of fragments detected by DNA gel blot hybridization (r = 0.70, P < 0.05). The MSATE6 profile was therefore consistent with, but more informative than, DNA gel blot analysis. Therefore, these three approaches consistently confirmed significant variation in the copy number of RGC2 genes that was not correlated with species.

The minimum number of RGC2 genes in each genotype was calculated from the combined analysis of MSATE6 and the sequenced RGC2 fragments (Table 2). PI491204 (L. saligna) and cv Mariska had the lowest minimum copy numbers of 12 and 13, respectively; the minimum number of RGC2 genes in cv Calmar and US93UC-10 (L. saligna) was 20. However, the minimum number of RGC2 genes in cv Diana was 32.

Although W66336A (L. serriola) initially appeared to have more RGC2 genes than Diana (at least 42), we determined that W66336 is heterozygous, comprised of two distinct haplotypes. To check if W66336A was heterozygous at the RGC2 locus, 10 individuals each of W66336A and Diana were analyzed using microsatellite MSATE6. Segregation of two distinct MSATE6 profiles was observed in W66336A, including heterozygous individuals with a combination of the two profiles (Figure 2B). Therefore, the plants representing W66336A must have been segregating for the RGC2 locus, and the 39 RGC2 genes amplified from W66336A were derived from two L. serriola haplotypes (designated as A and B). Haplotype A has 14 bands in its MSATE6 profile, six of which were not present in haplotype B. Sequence analysis of the cloned fragments indicated that these six MSATE6 bands were derived from at least 12 RGC2 genes; therefore, these genes specific to haplotype A could be designated RA1 to RA12. On the other hand, haplotype B has 13 bands in its MSATE6 profile. There are five bands and five genes specific to haplotype B, which could be designated RB1 to RB5. The other 22 genes produced MSATE6 fragments common to both haplotypes and were therefore designated RAB1 to RAB22. Assuming that the 22 RAB genes and at least three unsequenced genes were equally distributed in the two haplotypes, the minimum copy number of RGC2 genes in haplotypes A and B within W66336A were estimated to be ~25 and 18, respectively. By contrast, no segregation at the RGC2 locus was observed in cv Diana. Therefore, the copy number of RGC2 genes varied from ~12 (PI491204, L. saligna) to >32 (cv Diana).

Large Insertions Encoding Multiple LRRs Resulted from Unequal Crossing-Over
The 126 distinct 3' RGC2 sequences from the seven genotypes were aligned using ClustalX. The intron (intron 5) in this region was highly dissimilar between some genes and therefore could not be included in the alignment. Large insertions (>237 bp) had occurred in the exons of 32 fragments. Fragments with identical insertion sites and very similar inserted sequences (>97%) were considered to be derived from the same event. This indicated that the 32 large insertions were the results of 13 independent events (Table 3). Seven events were found in more than one gene; six of these were detected in multiple species and at least four events had occurred before speciation of L. serriola and L. saligna (Table 3).


View this table:
[in this window]
[in a new window]
 
Table 3. Large Indels Predicted to Result from Unequal Crossing-Over

 
All of the 13 inserted sequences were direct repeats of LRR-encoding regions. This structure indicated that these large insertions were the result of unequal crossing-over within or between RGC2 genes. To investigate whether the unequal crossing-over had been intragenic or intergenic, sequence alignments were made for each of the 13 events. For each event, the fragments were divided at the insertion site, and the two sequences were independently aligned with all other RGC2 fragments. The two ends of the alignments were then trimmed and the duplicated regions analyzed. The duplicated sequences in nine of the 13 duplication events were more similar to other sequences than to each other (e.g., Figure 3); therefore, these duplications had resulted from unequal crossing-over between two different genes. For the other four duplication events, the duplicated sequences were most similar to each other (e.g., Figure 3B) but were not nearly identical, ranging from 89.7 to 95.8% nucleotide identity. These duplications could have been the result of an ancient intragenic unequal crossover event or the result of unequal crossing-over between closely related paralogs that were not identified in our analysis (Table 3).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 3. Distance Trees Indicating the Origin of Duplications within the LRR.

Each duplicated region with large insertions was divided into two parts at the insertion sites (-L and -R, indicated by arrows). Then, the two fragments from all duplications were aligned with the same region of all RGC2 genes. Only parts of the distance tree indicating the origin of two duplications are shown.

(A) Duplications in fragment TM7 indicating a paralogous origin.

(B) Duplications in fragment RA5 indicating a potentially allelic origin.

 
The positions of the duplications were nonrandom. Four of the 13 duplication breakpoints occurred within a 150-bp region in exon 5. All of the other nine breakpoints occurred in exon 6 within a 35-bp region (Table 3). The duplications in genes TM7, RA5, RA1, LC4, and TC2 involved different sequences but occurred at exactly the same position. Therefore, the middle of exon 5 and the end of exon 6 have been hot spots for unequal crossing-over relative to the rest of the region analyzed (Figure 4).



View larger version (37K):
[in this window]
[in a new window]
 
Figure 4. Unequal Distribution of Point Mutation, Insertion/Deletion, and Recombination Events in RGC2 Genes from Lactuca spp.

Positions of these different events detected in all RGC2 sequences are indicated on the amino acid sequence of the 12 LRRs from TDB (RGC2B and Dm3). Aliphatic residues in the LRR consensus are shown in red. Position of intron 5 in LRR 6 is shown by a diamond. Hypervariable sites showing significant posterior probabilities using PAML indicative of diversifying selection are shaded in green. Hot spots for unequal crossing-over that resulted in duplications of one or more LRRs are underlined with a red bar. The positions of small, mostly in-frame, indels (<64 bp) in the core sections of each LRR are shown with asterisks; multiple occurrences of indels at the same position are indicated by numbers. The positions of indels in the extended sections of the LRR were too numerous to display and could not be positioned accurately because of sequence divergence between RGC2 sequences. The complex hyperpolymorphic microsatellite sequence in LRR 3 is shown by a blue bar.

 
The size of the inserted sequences varied from 237 to 501 bp, preserved the open reading frame, and encoded two to five LRRs. Most of the inserted fragments encoded complete LRRs and only one or two amino acids shorter than LRRs in TDB (Table 3). However, the inserted sequence in BI6-3 was 14 amino acids shorter and was missing 12 amino acids in the LRR consensus motif. It is unknown if sequences with a shortened LRR are functional. The majority of duplications preserved the predicted structure of the LRR with putative solvent-exposed ß-sheets. The breakpoints of insertions were located outside the LRR consensus motif and in most cases resulted in the duplication of complete rather than truncated LRRs (Table 3).

Patterns of Sequence Diversity Identified Two Types of RGC2 Genes
A neighbor-joining distance tree was constructed for the 126 fragments using SUN7, an RGC2 fragment from sunflower (Helianthus annuus), as the outgroup (Figure 5). Two distinct types of genes (designated Type I and Type II) were recognizable based on the patterns of sequence diversity between the genes. The two types were separated by a node with a bootstrap value of 97%. A parsimony tree for the same set of fragments had similar topology to the neighbor-joining tree but a lower bootstrap value for the node separating Type I and Type II genes (data not shown).



View larger version (20K):
[in this window]
[in a new window]
 
Figure 5. Distance Tree Derived from the 3' Region of 126 RGC2 Genes from Seven Genotypes.

Genes from different genotypes are marked in different colors: L. sativa is shown as shades of blue, L. saligna as shades of red, and L. serriola as shades of green. SUN7 (an RGC2 homolog from sunflower) was used as the outgroup. Type I and Type II genes were distinguished by different tree topologies. The different Type II clades are designated at right.

 
The Type I group contained 48 genes (38%) and usually had medium length branches (5 to10% nucleotide substitutions) in the distance tree. Most bootstrap values between Type I genes were <90% with the exception of 10 nodes. Their pairwise sequence identities varied from 86.9 to 100%, with the majority (87.9%) varying from 90 to 95%. Only four of the 1128 pairwise nucleotides identities within Type I genes were >99%. There was no clear relationship between taxonomy and position in the distance tree, and it was impossible to identify potentially allelic or orthologous relationships between genes in the Type I group except in a few cases, such as TC12 and TDAB.

In contrast with Type I, the branches linking the 78 Type II genes were either long (>10% nucleotide substitutions) or short (<5% nucleotide substitutions) (Figures 5 and 6). The bootstrap values between each tight clade in Type II were without exception 100%. Most (83.9%) pairwise sequence identities between genes in different Type II genes were <85%, whereas those between genes within each tight clade were >97%. Sequence identities of 90 to 95%, which were prevalent between Type I genes, were rare between Type II genes (Figure 6). Several of the tight clades had a single representative of each of the genotypes; six tight clades had two representatives from W66336A that contained two RGC2 haplotypes (see above). This distribution suggested that genes within each tight clade are alleles or orthologs.



View larger version (10K):
[in this window]
[in a new window]
 
Figure 6. The Distribution of Pairwise Nucleotide Identities among 48 Type I and 78 Type II Genes.

The majority of Type I identities are between 90 and 95%; conversely, identities between Type II genes are either greater or less than that of Type I genes.

 
Sequence Exchanges Are Frequent between Type I Genes but Rare between Type II Genes
Besides differences in nucleotide similarity, another major difference between Type I and Type II RGC2 genes was the frequencies of sequence exchanges. A total of 79 independent sequence exchanges were detected between all 126 RGC2 fragments using Geneconv (P < 0.05). Seventy-six of them occurred between Type I genes; only two occurred between a Type I gene and a Type II gene and one between two Type II genes even though more Type II genes than Type I genes were represented in the 126 sequences. On average, each Type I gene had three sequence exchanges with other Type I genes in the 1.2-kb LRR-encoding region studied. The length of the sequence exchanged varied from 60 to 528 bp, with an average exchange length of 201 bp.

Sequence exchange among Type I genes was investigated in greater detail by comparing distance trees for 12 different sections of these sequences. First, the ~1.2-kb sequences were divided into 12 LRR-encoding sections. Each section contained ~100 bp, with one LxxLxxLxLxxCxx motif located in the middle. Trees were constructed for each of the 12 sections in the Type I genes. The topologies of these trees were considerably different from the tree constructed from the whole sequence of the Type I genes (Figure 7). Unlike the tree for the whole region that had medium branch lengths for Type I genes (Figure 5), these trees had several tight clades of identical or nearly identical sequences. The distribution of sequences within the trees varied; genes with high sequence similarity at the first LRR were not always similar at the last LRR (Figure 7). For example, one tight clade had eight (RA8, RA6, TDG, TM6, RA2, LC10, RA4, and RA1) sequences that were identical in the first 100 bp. However, these sequences varied by as much as 10% in the last 100 bp and were distributed throughout the tree for the last 100 bp (Figure 7).



View larger version (31K):
[in this window]
[in a new window]
 
Figure 7. Distance Trees for LRR 1 (Left Tree) and LRR 12 (Right Tree) of Type I Genes Showing the Lack of Correlation between LRRs Derived from the Same Gene.

For illustration, genes with identical LRR 1, RA8, RA6, TDG, TM6, RA2, LC10, RA4, and RA1, are joined by lines to diversify positions in the tree for LRR 12.

 
Fragments encoding individual LRRs within each tight clade had >98% nucleotide identities and were often identical. Some tight clades had as many as 12 members representing all three species. The presence of identical fragments in the different species indicates that the rate of point mutation in Type I genes has been very low, just as it has been in Type II genes (see below). The diversity between Type I genes is therefore mainly attributable to exchanges of blocks of sequence encoding one or more LRRs rather than the accumulation of novel point mutations at the hypervariable sites within the consensus motif of LRR.

Individual Type I Sequences Are Rare in Natural Populations Because of Frequent Sequence Exchanges
To determine the prevalence of Type I sequences, either whole or in part, primers specific to a variety of Type I genes were used either together or in combination with conserved primers to amplify sequences from a panel of 40 additional wild accessions. These included 33 accessions of L. serriola, six accessions of L. saligna, and one accession of L. perennis. Oligonucleotide primers were designed to the hypervariable region within the consensus motif of first and last LRRs of Type I genes (Figure 1C, Tables 1 and 4). Combinations of one primer specific to a Type I gene with one conserved primer (either Ex4b or Ex5r1) were used to determine the frequency of each primer site in the panel of 40 accessions. The frequency of the most frequent specific primer sites ranged from 25 to >50% and was not correlated with their frequency in the initial seven accessions (Table 4).


View this table:
[in this window]
[in a new window]
 
Table 4. The Relative Frequencies of Fragments Amplified Using Gene-Specific Primers from the Initial Seven Genotypes and the Frequencies Amplified from an Additional 40 Genotypes

 
Amplification products were only rarely obtained using pairs of Type I specific primers (Table 4). On average, the frequency of detecting amplification products from the 40 additional accessions was 8%. Primers specific to individual Type I genes could amplify products when combined with primers specific to different Type I genes (Table 4). Combinations of gene-specific primers that resulted in amplification products were as likely to have been derived from different Type I genes as from individual Type I genes present in the original seven genotypes (Table 4). The low frequencies of combinations of any two Type I specific sequences provided further evidence that Type I genes have undergone extensive shuffling, and many RGC2 genes encoding different combinations of individual LRR sequences may exist in nature. No PCR amplifications were observed when a Type I specific primer was used in combination with a Type II specific primer; this is consistent with minimal sequence exchange between Type I and Type II genes.

Type I Genes Comprise Diverse Chimeras in Natural Populations
To provide additional information on the evolution of Type I genes and to test whether the original amplification of the seven genotypes using the two universal primers (Ex4b and Ex5r1) had resulted in a biased sample, 72 of the fragments amplified using 10 different gene-specific primer combinations were cloned and sequenced (Table 4). These 72 additional sequences were aligned with the 48 Type I genes characterized previously from the seven genotypes, and a distance tree was constructed using RAB5, the Type II gene most similar to Type I, as the outgroup (data not shown).

Nine fragments, which were amplified by primer combination Ex4b and 5B, formed a tight clade with DNA identities of 99.1 to 100% and apparently belonged to a Type II clade that had not been detected in the initial seven genotypes. One of the nine fragments was from L. saligna; the other eight were from L. serriola. They were dissimilar to any other Type I or Type II gene (<85% nucleotide identities). These new Type II genes were 1.3 kb between primers Ex4b and 5B, whereas the Type I genes amplified by the same primer combination were 1.7 kb. They had a frequency of 25% in the 40 additional accessions but were apparently absent from the initial seven genotypes.

All of the other 63 sequences, together with the 48 Type I genes characterized earlier, exhibited similar sequence identities with each other. The distance tree for all 111 Type I genes had a similar topology to that for the Type I genes from the seven initial genotypes (data not shown). Sequences amplified by the same primer combinations were scattered throughout the tree, indicating that the original pair of primers (Ex4b and Ex5r1) had not provided a biased sample of sequences. Also, sequences amplified by the same primers were usually dispersed within the tree; conversely, genes that grouped together had often not been amplified using the same primer combination. Numerous sequence exchanges were observed between these 111 Type I genes (data not shown). These data provided further evidence of extensive shuffling of Type I sequences.

Highly Conserved Type II Alleles/Orthologs Occur at Different Frequencies in Natural Populations
The 78 Type II sequences were distributed in 18 tight clades (Figure 5). In addition, eight putative Type II genes had no obvious alleles or orthologs. The numbers of alleles/orthologs detected in each of the 18 Type II clades varied from two to eight. To investigate whether these differences in the frequencies of alleles/orthologs was because of sampling bias or reflected their actual prevalence in the seven haplotypes studied, the panel of 40 additional wild accessions were analyzed using 14 pairs of oligonucleotide primers specific to individual tight clades (Tables 1 and 4). In contrast with the results for Type I genes, RGC2 fragments were amplified from similar numbers of accessions by pairs of primers that were both specific to the clade and by pairs of primers consisting of a clade-specific and a conserved primer (e.g., K-for + K-rev compared with K-for + Ex5r1; Table 4). This is consistent with a low frequency of chimeric sequences. Subsequent sequencing of these PCR fragments showed that only alleles/orthologs were amplified using each primer combination. Fragments amplified using primers specific to clade K had 97.3 to 100% nucleotide identities. Similarly, fragments amplified using primers specific to L clade had 98.3 to 100% nucleotide identities with each other and were obvious alleles or orthologs of TDL (see below).

Some Type II clades were detected in many genotypes of different Lactuca species, whereas other clades were rare. The frequencies of K, L, and M alleles/orthologs were 38/40, 34/40, and 24/40, respectively, which was consistent with their prevalence in the initial seven genotypes (Table 4). Twenty-four of the 33 L. serriola genotypes but none of the six additional L. saligna genotypes had an F allele/ortholog; this was again consistent with the taxonomic distribution observed within the initial seven genotypes. Three primers (RA7, LC7, and TC1) specific to three Type II genes that were detected only once in the seven genotypes amplified products from only 2, 3, and 3 of the 40 additional genotypes, respectively (Table 4). Therefore, the variability in the prevalence of Type II alleles/orthologs in the initial seven genotypes was not an experimental artifact, and the different Type II clades varied in frequency from rare to almost ubiquitous.

Type II Orthologs Exhibit Trans-Specific Polymorphism
Fragments of six single-copy genes that were not resistance genes were chosen to serve as reference sequences for the analyses of the RGC2 genes and to confirm the taxonomic relationships of the seven Lactuca genotypes from which the RGC2 genes were cloned. Distance trees were constructed for the sequences of these six genes, and similar topologies were obtained for each tree (data not shown). Sequences from the same species always grouped tightly together and were often identical; intraspecific polymorphisms were rare (Table 5). Average KA:KS (nonsynonymous:synonymous nucleotide changes) ratios ranged from 0 to 0.23 for the six genes, consistent with purifying selection acting on these genes.


View this table:
[in this window]
[in a new window]
 
Table 5. Single-Copy Nonresistance Genes from Lactuca spp

 
In contrast with the nonresistance genes, the distribution of RGC2 genes in the distance tree did not reflect the taxonomic relationships of their genotypes of origin. Different genotypes were dispersed throughout the tree rather than clustered in species-specific clades. A Templeton test (Templeton, 1983Go) indicated that the trees for the K and L Type II clades of RGC2 genes were significantly different from the trees based on nonresistance gene sequences (P < 0.0001). The relationships of alleles/orthologs from each accession varied within each Type II clade. Alleles from the same species were often not the most similar to each other; for example, in clade M, TM3 (an allele from cv Mariska) was the most divergent allele/ortholog, whereas the other two alleles from L. sativa (TDM and TC8) were identical or nearly identical to orthologs from L. saligna (LA8 and LB3). Such trans-specific polymorphisms may be indicative of balancing selection maintaining ancient polymorphism.

Point Mutations as the Most Common Polymorphism between Type II Alleles/Orthologs
Some RGC2 clades exhibited low frequencies of nucleotide ploymorphisms, even lower than single-copy nonresistance genes (Tables 5 and 6). The levels in nucleotide polymorphism observed between alleles/orthologs were clearly not consistent with markedly elevated rates of point mutation in the 3' regions of RGC2 genes. Point mutations between alleles/orthologs were distributed throughout the LRR-encoding region rather than being concentrated in the hypervariable sites within each LRR, in contrast with polymorphisms between paralogs (see below). Point mutations were more prevalent than insertion/deletions (indels) and sequence exchanges between alleles/orthologs within individual Type II clades (Table 6). The total number of indels per clade was low, and only 23 indel polymorphisms were observed within clades (Table 6). The pairwise nucleotide identities between alleles/orthologs within a clade varied from 95 to 100% (mean = 97.8%). This was only slightly lower than nucleotide identities for the single-copy, nonresistance genes, which varied from 97.3 to 100% (mean = 98.9%).


View this table:
[in this window]
[in a new window]
 
Table 6. Point Mutations and Sequence Exchanges Detected between Orthologs

 
Characterization of K and L Alleles/Orthologs from Additional Wild Accessions
To investigate allelic/orthologous variation and its genetic basis, K and L alleles/orthologs from the panel of 40 accessions were analyzed in detail. Primers that were specific to members in clade K were used in combination with the conserved primers (Ex4b or Ex5r1) to amplify K alleles/orthologs. Products were obtained from 38 of these genotypes using at least one of the primer combinations. We sequenced 35 of these products derived from 29 accessions of L. serriola plus five from L. saligna and one from L. perennis.

Sequences from 43 K alleles/orthologs, including the eight fragments from the seven genotypes characterized previously, were compared and a distance tree constructed using the closest paralog RA7 as the outgroup (Figure 8). The pairwise nucleotide identities between Kperennis (from L. perennis), the most divergent ortholog, and the other alleles/orthologs varied from 94.7 to 95.8%. The nucleotide identities between the other K alleles/orthologs were between 97.3 and 100%. These nucleotide identities were much higher than those between a K allele/ortholog and any paralog (<84.5%). K alleles/orthologs are therefore highly conserved in these four Lactuca spp, and they are clearly distinguishable from their closest paralog. Even the most divergent fragment from the sexually incompatible species, L. perennis, retained an obvious orthologous relationship with other alleles/orthologs in clade K.



View larger version (19K):
[in this window]
[in a new window]
 
Figure 8. Neighbor-Joining Distance Tree of 43 K Alleles/Orthologs from 42 Genotypes of Lactuca spp.

Fragment RA7, the closest nonorthologous sequence, was used as outgroup; only half of the branch linking to RA7 is shown. Alleles from L. saligna are shaded. The values on the branches indicate the percentage of 1000 bootstrap replicates; values <60 are not shown.

 
Several subclades of K alleles/orthologs could be distinguished (Figure 8). Seven were separated by nodes with bootstrap values >67%; four of these nodes had bootstrap values >90%. These subclades may represent several allelic lineages. Each subclade frequently had representatives from multiple species. Subclade K3 includes two fragments from L. saligna and three fragments from L. serriola; K2351 from L. saligna differs from KT181 from L. serriola by only one base. There are three fragments in subclade K5, one from L. saligna and two from L. serriola that differed by only one or two bases. Such trans-specific polymorphism was consistent with previous observations and provides additional evidence that nucleotide differences were ancient and that balancing selection has maintained polymorphism within each Lactuca spp.

Only three, nine, and 14 sequence exchanges were detected among K alleles/orthologs using Geneconv, visual inspection, and the four-gamete method, respectively (Table 6). Fragments K289 and K639 in subclade K6 were extensive chimeras consisting of sequences from several different K alleles/orthologs (data not shown). K5013 was close to subclade K5 (Figure 8) but differed from K5 genes because of a sequence exchange in the first 700 bp between K5013 and KT10/KT46. The variation between KT125 and LC8 was caused by an ~400 bp sequence exchange between KT125 and genes in subclade K3. The variation between TDC7 and the remaining subclade K3 genes was because of gene conversion with a gene from subclade K1. Therefore, detectable sequence exchanges between alleles/orthologs within clade K have occurred and may be important in generating diversity.

Similar results were obtained for the L clade. A total of 27 new sequences were obtained from the panel of 40 genotypes using primers specific to L alleles/orthologs. Excluding the MSATE6 sequence and fragment Lperennis from L. perennial, the 34 sequences (including seven from the initial seven genotypes) exhibited an average of 99.0% nucleotide identity and an obvious allelic/orthologous relationship.

Highly Conserved Introns in 3' of Type I Genes but Not Type II Genes
Analysis of intron 5 sequences revealed very different characteristics for Type I and Type II genes. Four of the 111 Type I genes were excluded because three of them had a deletion of the entire intron 5 region and one had poor sequence. Of the remaining 107 Type I genes, the size of intron 5 ranged from 616 to 1037 bp, with an average of 679 ± 56 bp. The intron sequences could be readily aligned, and nucleotide identity varied from 94 to 100% with an average of 97%. This was greater than the nucleotide identities of the Type I flanking coding regions that ranged from 89 to 100% (average 92%). No hypervariable sites were evident within the intron. Intron variation was mainly attributable to evenly distributed point mutations and a few indels that were usually <20 bp. Insertions of 358 bp were observed in TDH and RB4. The inserted sequences were apparently derived from the same event because fragments TDH and RB4 were almost identical in the insertion and other regions. BLASTN analysis failed to detect significant sequence similarities to the inserted sequences, and there was no evidence that these sequences were related to retrotransposons or long terminal repeats.

Analysis of sequence exchanges in exons found that at least 22 sequence exchanges extended into the intron (see above). Sequence exchanges were also investigated by comparison of intron sequences of all Type I RGC2 genes. Analysis using Geneconv, visual inspection, and DnaSP detected 0, 6, and 18 exchanges, respectively, within the intron sequences. These were distributed evenly throughout the intron sequence. No elevated frequency of sequence exchange was observed relative to the flanking exon sequences.

To determine whether the high nucleotide identities between intron 5 sequences of Type I genes were unusual or whether this was typical of all introns in Type I RGC2 genes, nucleotide identities were calculated for all introns of sequenced Type I RGC2 genes, with the exception of intron 3, which varied dramatically in size (372 to >6 kb) (Table 7). High nucleotide identities were observed only in introns 4, 5, and 6 in the 3' half of the gene; the nucleotide identities of these introns were higher than those of their flanking coding regions. By contrast, the nucleotide identities in introns 1 and 2 were lower than in their flanking coding regions. This variation in intron similarity in different regions may reflect the variation in frequency of sequence exchanges: sequence exchanges were frequent in the 3' LRR region but rare in the 5' half.


View this table:
[in this window]
[in a new window]
 
Table 7. Average Nucleotide Identities (%) at Different Regions of RGC2 Genes in Diana and Mariska

 
By contrast, intron 5 of Type II genes varied greatly. Intron 5 of 78 Type II genes ranged in size from 133 to 788 bp, with an average of 317 bp. All intron 5 sequences except that in LC11 were <412 bp and were considerably shorter than intron 5 in Type I genes (679 ± 56 bp). Intron 5 in LC11 was 788 bp, but it had no sequence similarity with other RGC2 genes. BLAST analysis of the intron sequences did not reveal significant sequence similarity to transposon sequences. Introns from the different Type II clades were difficult to align because of low sequence identity. When alignments were possible for subsets of sequences, the highest nucleotide identity between intron 5 sequence of Type II genes from different clades was only 79%, and some were <50%, which was much lower than the average nucleotide identities between the exons of different Type II clades (~80%).

Small Indels Occurred Mainly in Regions of Type II Genes Encoding Extended LRRs
The position and frequency of indel events were determined for the 126 RGC2 fragments from the seven genotypes. Indels at the same position and with the same size were considered as being potentially derived from the same event and, therefore, only counted once. The region containing the hypervariable microsatellite MSATE6 was considered separately.

A total of 141 independent indels were identified, excluding those involving the microsatellite MSATE6. Thirteen of them were large duplications resulting from unequal crossing-over as described above. There were also three large deletions, a 1.4-kb deletion in gene TDD, a 186-bp deletion in TDAD, and a 99-bp deletion in TM7. All of these 16 indels retained the open reading frame (ORF), except for the deletion in TDD, and changed the number of encoded LRRs by at least one. All of the other 125 indels were <64 bp, with an average of 8 ± 6 bp, and did not change the number of LRRs. The majority of these small indels also retained the ORF; only five resulted in frame-shift mutations.

The frequency of each small indel varied from 1 to 48 in the 126 RGC2 fragments. All 125 small indels detected were present in the 78 Type II genes, but only 14 were present in the 48 Type I genes (Table 8). The majority of the 141 indels were present in more than one species; five indels were present in more than one fragment but only from one species, and only 23 of the 141 indels were detected just once. The existence of frequent trans-specific polymorphism for these indel polymorphisms suggested that they may be under balancing selection.


View this table:
[in this window]
[in a new window]
 
Table 8. Distribution of Indels in 12 LRRs of Type I and Type II RGC2 Genes

 
Diversifying Selection Acting on the LRR Region
The 12 conserved (unduplicated) LRRs in the 3' fragments amplified from the seven genotypes were analyzed using multiple models within PAML (Yang, 1997Go; Yang et al., 2000Go) to examine selective forces acting on individual sites in the LRR-encoding region. The putative solvent-exposed residues of the LRR consensus motif had previously been shown to be under significant diversifying selection in RGC2 and other resistance genes (see references above). However, most of these earlier analyses had involved pooling of sequences encoding these sites along the gene and the analysis of KA:KS ratios between paralogs. In contrast with previous studies, the availability of many sequences allowed us to study individual sites within the LRR.

The M7 and M8 models of PAML were compared to investigate whether there were sites under diversifying selection. The likelihood-ratio test of M7 versus M8 was consistent with sites being under diversifying selection ({chi}2 = 1990, df = 2; P < 0.001). A total of 43 out of ~493 sites were identified by model M8 as being under diversifying selection (P < 0.01; Figure 9). The 10th and 11th residues in most LRRs were clearly under diversifying selection (Figures 4 and 9). The 8th, 13th, and 14th residues in some LRRs, but not all, were also under diversifying selection. These inferences of diversifying selection have the caveat that recombination and gene conversion may have resulted in overestimations of positive selection (Anisimova et al., 2003Go).



View larger version (17K):
[in this window]
[in a new window]
 
Figure 9. Distribution of Sites Identified as Being under Diversifying Selection within the 3' LRR Region.

The positions of 12 LRRs amplified from the 3' end of RGC2 genes from Lactuca spp are indicated below. The ticks in LRRs 1, 3, 6, 8, and 11 separate the core sections and extended sections. The positions of the hypervariable, putative solvent-exposed regions are indicated by asterisks.

 
The 3' Half of Type I Genes Has Evolved Differently from the 5' Half
To determine whether the 3' half had undergone more frequent sequence exchanges relative to other regions of the gene, sequence exchanges and their distribution were analyzed between 26 RGC2 genes from cv Diana and four genes from Mariska for which we had full-length or near-full-length sequences (this article; T. Wroblewski and R. Michelmore, unpublished data). Of these, 17 (57%) were Type I, and 13 (43%) were Type II genes, based on the criteria described above. Because Geneconv ignores sites with indels, several alignments were examined to maximize the detection of sequence exchanges. Initially, 22 genes with no large indels were analyzed, and then genes with progressively larger indels were included to detect additional sequence exchanges. A total of 98 sequence exchange events were detected (P < 0.05). Ninety-one of the 98 (93%) sequence exchange events were between two Type I genes.

Eighty-two of the 98 sequence exchanges occurred in the 3' half of the gene. The length of sequence exchange in the 3' half was usually small, ranging from 72 to 392 bp with an average of 194 ± 68 bp, which is consistent with the average exchange length (203 bp) detected in the 126 RGC2 3' fragments. By contrast, only 16 sequence exchanges were detected within the 5' half of the gene, and most of them were large. Eight of the 16 sequence exchanges extended from the beginning of the gene to exon 4, and two of them were greater than 3 kb. If these eight events are considered to have resolved in the 3' half, only eight (8%) sequence exchanges resolved within the 5' half of the gene. Sequence exchange within the first 3 kb of seven genes (TDAB, TDAF, TDG, TDH, TDI, TDJ, and TDV) made these Type I genes highly conserved in the 5' region, with nucleotide identities >96%. There were 3.7-kb sequence exchanges among genes TDA, TDE, and TDN at the 5' end. Sequence exchanges between genes TDB and TDS and between genes TDC and TDD were 8.0 and 3.9 kb, respectively. Considering that these sequence exchanges are large and extend to the 5' end, they might be the results of intergenic recombination. Alternatively, these genes might have been created by duplication followed by rapid divergence of their 3' halves as a result of extensive gene conversions, whereas their 5' half evolved slowly and remain highly conserved.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 METHODS
 REFERENCES
 
The distribution of variation in a genome is the result of an intricate interplay between mutation, recombination, selection, and demography and is influenced by the reproductive system and ecological constraints. Because of the complexity of their genetic structure and the potential diversity of selective forces acting on them, clusters of resistance genes represent both an opportunity and a challenge to determine the relative importance of each of these factors in the evolution of disease resistance.

This article describes the most comprehensive analysis to date of diversity at a cluster of resistance gene candidates in plants. Sequence analysis of 126 LRR sequences revealed a wide variety of genetic events that have resulted in a large number of diverse RGC2 genes. Variation occurred in gene copy number, LRR repeat number, and sequence. Point mutation, insertion, and recombination have all been important, but the position at which they occur in the gene and their impact has been different at different times in the evolution of RGC2 genes. Two distinct subsets of resistance gene candidates (designated Type I and Type II) within Lactuca spp exhibited different patterns of evolution in several respects (summarized in Table 9).


View this table:
[in this window]
[in a new window]
 
Table 9. Summary of Differences between Type I and Type II RGC2 Genes

 
Methodological Considerations
The initial use of universal followed by more specific oligonucleotide primers resulted in the amplification and effective sampling of most but not all genes in a haplotype. Although not all RGC2 genes were amplified from all genotypes, there was no evidence of bias in our sampling. Multiple independent amplifications from each genotype provided robust data that identified single nucleotide differences with confidence. The isolation and comparison of large numbers of genes from multiple genotypes allowed more detailed analyses than were possible in previous comparisons that were usually between paralogs from one or a small number of haplotypes (e.g., Meyers et al., 1998bGo; Noël et al., 1999Go). In addition, previous studies have involved smaller clusters of R-genes; the large number of RGC2 genes within a haplotype made the different patterns of variation obvious.

There are several challenges to analyzing highly variable data from an inbreeding species. The reality analyzed here—a large locus of variable size with multiple alleles and paralogs potentially under selection, some of which exhibit high rates of gene conversion—is much more complicated than the situations modeled to date (Wiuf and Hein, 2000Go; Innan, 2002Go). Therefore, some of the specific values must be treated with caution. However, phylogenetic analysis was a useful tool to look for patterns of variation that revealed differences between two sets of sequences and allowed us to erect the hypothesis of two types of RGC2 genes that was subsequently validated by independent approaches.

Variation in Gene Copy Number
The numbers of RGC2 paralogs can vary greatly within and among these three species of Lactuca. Minimum estimated copy numbers varied from 12 to 32, with most genotypes having ~20 RGC2 genes. The majority of copy number changes involved Type I genes. This is consistent with the involvement of Type I genes in frequent interparalog recombination. Resolution of such recombination complexes as a crossover would have resulted in changes of copy number, whereas noncrossover events would have resulted in gene conversions.

Lettuce cv Diana has at least 32 RGC2 genes. The large number of resistance gene candidates in Diana was most likely created by unequal crossing-over resulting in the amplification of only Type I genes in the center of the cluster because only one Type II clade (Q clade) has more than one copy from Diana. There are no obviously close pairs of Type I genes in Diana, implying that the Type I genes originated from different haplotypes or that the expansion and variation of Type I genes in cv Diana were created by unequal crossing-over followed by frequent gene conversions in the LRR region. Interestingly, the experimentally selected spontaneous losses of Dm3 specificity (Chin et al., 2001Go) involved deletions of Type I genes but not Type II genes, possibly indicating a reversal of the expansion process.

There are few data on the variation of R-gene copy number in other species. In Arabidopsis, deletions of resistance gene sequences have been documented at complex and single copy loci (McDowell et al., 1998Go; Stahl et al., 1999Go; Tian et al., 2002Go). Copy number variation (one to five copies) of Hcr9 homologs was observed at the Milky Way locus in three species of Cladosporium (Parniske and Jones, 1999Go). The copy numbers at the Rp1 locus varied from 1 to 15 in different maize cultivars (Collins et al., 1999Go; Sun et al., 2001Go; Ramakrishna et al., 2002Go). However, little variation of copy number has been observed in limited studies of several R-gene loci, such as RPP5 (Noël et al., 1999Go), N, and P in flax (Dodds et al., 2001aGo, 2001bGo).

Our observations of long-term variation in copy number are consistent with short-term evolutionary studies. Four of 167 recombination events at the RGC2 locus selected on the basis of flanking marker exchange occurred within the cluster of RGC2 genes and therefore resulted in haplotypes with nonparental copy numbers of RGC2 genes (Chin et al., 2001Go). Eleven out of 12 spontaneous mutants selected for the loss of Dm3 specificity carried deletions at the RGC2 locus (Chin et al., 2001Go). These were isolated from screens of 11,000 homozygous S2 families and 16,500 heterozygous F1 plants. Each lettuce plant can produce several thousand seeds. Therefore, it is likely that a few variant haplotypes are generated by every plant in each generation.

The observed variation in copy number is an integral component of the birth-and-death model of resistance gene evolution and is similar to that observed in vertebrate multigene families (Nei et al., 1997Go; Michelmore and Meyers, 1998Go; Piontkivska and Nei, 2003Go). Unequal crossing-over will generate variant haplotypes with deletions and duplications. The lack of representation of all genotypes in each of the Type II clades may be the consequence of stochastic processes or a lack of selective advantage. Changes in the number of R-genes maintained within a species may be caused by life histories that alter Ne, the effective population size, or by nonrandom loss of paralogs because of the heterogeneity of the fitness values of individual paralogs. Random duplications and losses of paralogs will be favored by situations that reduce Ne, such as those resulting from short-lived inbreeding populations inhabiting disturbed sites that are characteristic of Lactuca spp. Species that do not undergo genetic bottlenecks and retain a high Ne may be less susceptible to stochastic losses of individual genes. In such species, maintenance of individual genes will be more influenced by selective advantage or proximity to a paralog under selection.

Variation in LRR Number
Eighteen of the 28 full-length (or near-full-length) RGC2 genes from Diana and Mariska had 42 LRRs. This is a much larger number of repeats than present in R-genes in other species. The number of LRRs in both TIR and non-TIR R-genes in Arabidopsis averaged 14 with a range of 8 to 25 (Meyers et al., 2003Go). Most other R-genes are similar in size to the Arabidopsis genes; for example, Mi and I2 in tomato (14 and 17 LRR repeats; Milligan et al., 1998Go; Simons et al., 1998Go), P in flax (16 LRR repeats; Dodds et al., 2001bGo), and Rp1