|
|
||||||||
|
First published online November 20, 2002; 10.1105/tpc.004630 American Society of Plant Biologists
A High-Throughput Arabidopsis Reverse Genetics System
a Torrey Mesa Research Institute, Syngenta, 3115 Merryfield Row, San Diego, California 92121 1 To whom correspondence should be addressed. E-mail allen.sessions{at}syngenta.com; fax 858-812-1337
A collection of Arabidopsis lines with T-DNA insertions in known sites was generated to increase the efficiency of functional genomics. A high-throughput modified thermal asymetric interlaced (TAIL)-PCR protocol was developed and used to amplify DNA fragments flanking the T-DNA left borders from 100,000 transformed lines. A total of 85,108 TAIL-PCR products from 52,964 T-DNA lines were sequenced and compared with the Arabidopsis genome to determine the positions of T-DNAs in each line. Predicted T-DNA insertion sites, when mapped, showed a bias against predicted coding sequences. Predicted insertion mutations in genes of interest can be identified using Arabidopsis Gene Index name searches or by BLAST (Basic Local Alignment Search Tool) search. Insertions can be confirmed by simple PCR assays on individual lines. Predicted insertions were confirmed in 257 of 340 lines tested (76%). This resource has been named SAIL (Syngenta Arabidopsis Insertion Library) and is available to the scientific community at www.tmri.org.
The sequenced genome of Arabidopsis contains an estimated 26,000 genes (Arabidopsis Genome Initiative, 2000 Sequence homology can be helpful in predicting gene function; however, there are many genes without homology to functionally characterized genes. Furthermore, although sequence homology can reveal general functions, the precise function performed by a specific gene product cannot necessarily be determined from sequence homology alone.
Reverse genetics is a strategy to determine a particular gene's function by studying the phenotypes of individuals with alterations in the gene of interest. Efficient reverse genetics is an essential component of functional genomics programs aimed at the functional characterization of large numbers of genes. Arabidopsis reverse genetics has been aided by the establishment of large insertion mutant collections (Azpiroz-Leehan and Feldmann, 1997
An alternative method is to sequence regions flanking insertion sites in individual plants from large insertion mutant populations, thereby determining large numbers of insertion sites in advance (Parinov et al., 1999
Tissue and Seed Resource Arabidopsis plants of the Columbia ecotype were transformed using Agrobacterium containing either pCSA110 or pDAP101 by the method of McElver et al. (2001) 5% of the plants did not produce seeds as a result of sterility, lethality, insect damage, or underwatering. This resource has been named SAIL (Syngenta Arabidopsis Insertion Library).
Amplification and Sequencing of T-DNA Borders
To increase the efficiency of TAIL-PCR, the six AD primers (Liu et al., 1995
To improve the efficiency of sequencing, all products produced in an individual mTAIL-PCR procedure were sequenced together. Analysis of the resulting sequences revealed that multiple T-DNA left border mTAIL-PCR products could be sequenced together in a single reaction and that gel purification of individual PCR products before sequencing was not necessary. Basic Local Alignment Search Tool (BLAST) comparison of a typical mTAIL-PCR sequence with GenBank identified similarities of distinct regions of the mTAIL-PCR sequencing read to separate and unique regions of the Arabidopsis genome, as shown in Figure 2A . The beginning of the electropherogram of the sequence used for BLAST analysis in Figure 2A is composed of high-amplitude peaks superimposed over low-amplitude peaks, suggesting that it contains more than one mTAIL-PCR product, as shown in Figure 2B. The base-calling software used (Phred; Ewing et al., 1998
Tandem or complex T-DNA insertions result in adjacent T-DNA borders (Jones et al., 1987 100,000 primary transformants using mTAIL-PCR. The mTAIL-PCR products were sequenced with a distal T-DNA left border primer, analyzed for quality, and in some cases resequenced to generate higher quality reads. These sequences are referred to collectively as SAIL sequences.
Assigning Insertion Sites in the Collection The TIGR Arabidopsis annotation (release date, August 10, 2001; www.tigr.org) then was used to characterize the distribution of insertions in regions of the genome defined as "transcribed regions," "promoter," and "intergenic." Transcribed regions in this analysis were defined as TIGR annotated transcription units, promoters were defined as the lesser of 2 kb upstream or the distance to the next transcription unit, and the remainder of the genome was classified as intergenic. Figure 3 shows the distribution of the three insertion classes in a whole-genome view using 50-kb windows. In this analysis, 85,108 predicted insertions mapped to the genome: 25,710 (30%) mapped to transcribed regions, 37,156 (44%) mapped to promoters, and 22,242 (26%) mapped to intergenic regions. Of the total number, 4064 transcribed region, 6466 promoter, and 6555 intergenic associations were the result of a SAIL sequence segment having more than a single top hit with the highest BLAST score. Thus, in this analysis, the total number of insertions into transcribed, promoter, and intergenic regions was overestimated by 15, 17, and 30%, respectively.
Applying the definitions given above, the genome is composed of 48% transcribed regions, 28% promoter, and 24% intergenic. Therefore, the data suggest that there is a bias for insertions outside of the transcribed regions. These 85,108 predicted insertions occur in 52,964 SAIL lines, or 53% of the collection. T-DNAonly sequences were produced in 11.4% of the lines, fewer than predicted from the sample test of 96 lines. The remaining 36% almost certainly contain useful insertions. Presumably, insertion sites were not identified in these lines because of mTAIL-PCR failure and/or low-quality sequence from mTAIL-PCR products. To create a queryable resource to identify all possible insertions in a gene of interest, high-scoring segment pairs that identified SAIL lines containing predicted insertions in promoters and/or transcribed regions were collected. To maximize the identification of all predicted insertions that might disrupt the function of a particular gene, transcribed regions were defined as TIGR annotated transcription units plus 150 bp downstream of the predicted translation stop, and promoters were defined as 2 kb of sequences upstream from a predicted transcription unit regardless of whether the 2 kb extended into an adjacent transcription unit. This analysis was an attempt to detect all possible insertions in promoters and transcribed sequences and differed from that shown in Figure 3 in that once an insertion had been mapped to a transcribed region, it was not excluded from being mapped to a promoter of an adjacent gene. These data were assembled into a table that associates genes and their promoters, as named by their Arabidopsis Gene Index number (www.tigr.org), with mapped insertions detected in individual mutant lines. It also lists sequences of primers useful for the confirmation of insertion sites, as explained below. A sample of this SAIL table is shown in Table 1. The SAIL table includes all duplicate entries from resequenced samples (e.g., line SAIL_504C11 in At5g27100; Table 1) to give the user more confidence in choosing an insertion line. Additionally, the SAIL table lists some predicted insertions as occurring in the promoter and the transcribed region of two adjacent genes when they are separated by less than 2 kb (e.g., SAIL_439B09 in At5g27100 and At5g27110; Table 1).
To count the number of genes predicted to be disrupted in the collection, the SAIL table was filtered to remove predicted insertions from resequencing entries and predicted insertions in promoters that also were mapped to an adjacent transcribed region. The filtered SAIL table lists 28,774 insertions within the transcribed regions of 13,878 genes that were detected in 25,909 lines. Additionally, depending on the definition of promoter length, there are between 16,000 and 36,000 insertions in promoter sequences, as listed in Table 2. In summary, there are between 44,000 and 65,000 predicted insertions in the transcribed regions and/or promoters of 18,000 to 22,000 genes, depending on the defined length of the promoter. These numbers are likely an overestimation of the identified insertions, given that 15 to 17% of the predicted promoter and transcribed region insertions result from a SAIL sequence segment having more than a single top hit with the highest BLAST score (see above). Subtracting 17% from the listed figures above gives a revised estimate of 15,000 to 18,000 genes with disruptions in the transcribed regions and/or promoters.
Inspection of the SAIL table suggests that there are genes in the Arabidopsis genome in which insertions occur at high frequencies, as listed in Table 3. Twenty-two loci appear to have 20 or more insertions each, suggesting that there are "hot spots" for T-DNA integration. Gene-specific primers were designed to test for the presence of a subset of these insertions using PCR amplification with a T-DNA left border primer. Of the predicted insertion hot spots tested, none appeared to be a true hot spot (Table 3). A subset of insertions was confirmed in two members of a highly conserved multigene family. Because of the high degree of sequence similarity among these genes, the SAIL table analysis could not distinguish them, so an insertion in any one of them was mapped to all of them. None of the genes listed in Table 3 is predicted to contain multiple insertions at the same genomic position. Together, these data suggest that insertion hot spots in coding regions or their associated upstream sequences do not exist.
To determine the frequency of left border read-through during excision of the T-DNA from the binary vector within Agrobacterium and the resultant transfer of vector backbone into the plant genome (Martineau et al., 1994
Most of the Insertion Sites Predicted in the SAIL Collection Can Be Confirmed Experimentally
Segregating populations from 340 SAIL lines were tested to determine if the insertions were in the positions predicted in the SAIL table and to isolate plant lines homozygous for T-DNA insertion mutations. As described in Methods, primers flanking each predicted insertion in the SAIL table were designed and listed in the SAIL table. Two PCR procedures were used to test each predicted insertion site. One used the forward and reverse primers to amplify a product from chromosomes without the insertion. The other used the reverse primer and the T-DNA left border primer to amplify the product from a chromosome carrying the insertion. For each SAIL line studied, DNA from each of 20 to 30 plants was used for each of these two reactions. From the results, it was possible to determine if plants were homozygous for the insertion mutation, hemizygous, or lacked the insertion completely. Figure 4
shows an example of the use of this method to confirm an insertion in At4g39030, which encodes the disease resistance gene EDS5 (Nawrath et al., 2002
To obtain 99% confidence of finding a T-DNA insertion in any Arabidopsis gene, 280,000 T-DNA inserts would be required (Krysan et al., 1999
Steps taken to increase efficiency likely decreased the number of sequence tags identified for the following reasons: (1) only sequences flanking the T-DNA left borders were amplified, and not all T-DNA insertions include an intact left border (Zambryski et al., 1982
Only T-DNA left border flanking sequences were chosen for rescue because initial tests showed a lower frequency of T-DNAonly sequences on the left borders compared with the right borders. Previous reports on the characterization of T-DNA integration patterns in various species have found that between 30 and 90% of T-DNAs are arranged in direct or inverted repeats (Jones et al., 1987
Approximately 50,000 lines, or 50% of the collection, contain insertions likely to affect gene function and therefore are useful for functional genomics. In addition, there are many insertions in intergenic regions that also may prove useful for elucidating the biological functions of sequences currently thought to be nonfunctional. SAIL sequences from
Twenty-two genes were predicted to have disproportionately high numbers of insertions, although only a small fraction of predicted insertions in these genes could be confirmed. The repeated amplification of these sequences from independent samples using mTAIL-PCR may represent common artifacts that occur with the mTAIL-PCR protocol. The explanation that seems most likely is that recombination events occurred during some of the TAIL-PCR procedures, resulting in chimeric TAIL products. Taq polymerase is known to switch templates after reaching an end or a damaged region (Paabo et al., 1990 Although most of the insertions tested experimentally could be confirmed, 24% of the predicted T-DNA insertions could not. One possible source of these false-positive results could be attributable to cases in which a single SAIL sequence segment had more than a single top hit with the highest BLAST score. In these cases, the insertions predicted but not confirmed could be in a closely related gene. Another likely explanation is template switching during mTAIL-PCR, as described above. Additionally, sample contamination during DNA extraction and mTAIL-PCR, as well as sample tracking errors, could have contributed to the false-positive rate. Finally, because tissue from primary transformants was used for TAIL-PCR and sequencing, it is possible that agrobacteria could have persisted in the tissues of the primary transformants and created somatic insertion events. There are examples in the collection of genes for which more than one allele was identified but only one could be confirmed. Thus, it does not appear that all unconfirmable mutations are gametophytic lethal.
The SAIL collection contains numerous insertions in the transcribed regions and promoters of
Binary Vectors, Transformation, and Plant Growth The T-DNA vectors, Agrobacterium tumefaciens strain (GV3101), transformation methods, selection of transformants, and plant growth were as described by McElver et al. (2001) 100,000 transgenic Arabidopsis thaliana plants, which were collected onto 1045 96-well plates. Lines on plates 1 to 456, 1052 to 1057, 1142 to 1205, and 1206 (rows A to D) are homozygous for a qrt mutation (Preuss et al., 1994 -glucuronidase fusion. Lines on plates 500 to 918 and 1206 (rows E to H) to 1307 were transformed using pDAP101. The T-DNA in pDAP101 is 4763 kb and contains from left to right border a BASTA resistance cassette and pBluescript SKII+. Both T-DNAs have the same left border but different right borders. Vector maps can be found at www.tmri.org. Primary transformants were grown in greenhouses in Gilroy, CA, and Research Triangle Park, NC. On the Torrey Mesa Research Institute World Wide Web site and in conference presentations, this resource was referred to previously as the GARLIC collection, for Gilroy Arabidopsis Reverse Lethal Insertion Collection.
DNA Extraction
Modified TAIL-PCR Procedure First and second rounds of mTAIL-PCR cycling were performed on 384-well plates in Applied Biosystems 9700 thermocyclers. Cycling parameters for the first round were as follows: (1) 94°C for 2 min and 95°C for 1 min; (2) 5 cycles of 94°C for 30 s, 62°C for 1 min, and 72°C for 2.5 min; (3) 2 cycles of 94°C for 30 s, 25°C for 3 min (50% ramp), and 72°C for 2.5 min (32% ramp); (4) 15 cycles of 94°C for 10 s, 68°C for 1 min, 72°C for 2.5 min, 94°C for 10 s, 68°C for 1 min, 72°C for 2.5 min, 94°C for 10 s, 44°C for 1 min, and 72°C for 2.5 min; and (5) 72°C for 7 min. Cycling parameters for the second round were as follows: (1) 94°C for 3 min; (2) 5 cycles of 94°C for 10 s, 64°C for 1 min, and 72°C for 2.5 min; (3) 15 cycles of 94°C for 10 s, 64°C for 1 min, 72°C for 2.5 min, 94°C for 10 s, 64°C for 1 min, 72°C for 2.5 min, 94°C for 10 s, 44°C for 1 min, and 72°C for 2.5 min; (4) 5 cycles of 94°C for 10 s, 44°C for 1 min, and 72°C for 3 min; and (5) 72°C for 7 min. Platinum Taq polymerase (Life Technologies, Rockville, MD) was used for all amplifications. The primary mTAIL reaction contained 5 ng of genomic DNA.
Sequencing
Design of Primers for Testing SAIL Insertion Sites Upon request, all novel materials described in this article will be made available in a timely manner for noncommercial research purposes.
We thank University of California at Berkeley, Scripps Research Institute, and Salk Institute professors, post-docs, graduate students, and technicians, and numerous Syngenta scientists, for their generous assistance with leaf sample and seed collection. This work could not have been accomplished without their help.
Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.004630.
2 Current address: Ceres Incorporated, 3007 Malibu Canyon Road, Malibu, CA 90265. Received May 15, 2002; accepted September 10, 2002.
Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815.[CrossRef][Medline] Azpiroz-Leehan, R., and Feldmann, K.A. (1997). T-DNA insertion mutagenesis in Arabidopsis: Going back and forth. Trends Genet. 13, 152159.[CrossRef][ISI][Medline]
Chory, J., et al. (2000). Functional genomics and the virtual plant: A blueprint for understanding how plants are built and how to improve them. Plant Physiol. 123, 423425. Cluster, P., O'Dell, M., Metzlaff, M., and Flavell, R. (1996). Details of T-DNA structural organization from a transgenic Petunia population exhibiting co-suppression. Plant Mol. Biol. 32, 11971203.[CrossRef][ISI][Medline] De Buck, S., Jacobs, A., Van Montagu, M., and Depicker, A. (1999). The DNA sequences of T-DNA junctions suggest that complex T-DNA loci are formed by a recombination process resembling T-DNA integration. Plant J. 20, 295304.[ISI][Medline] De Neve, M., De Buck, S., Jacobs, A., Van Montagu, M., and Depicker, A. (1997). T-DNA integration patterns in co-transformed plant cells suggest that T-DNA repeats originate from co-integration of separate T-DNAs. Plant J. 11, 1529.[CrossRef][ISI][Medline] Deroles, S., and Gardner, R.C. (1988). Analysis of the T-DNA structure in a large number of transgenic petunias generated by Agrobacterium-mediated transformation. Plant Mol. Biol. 11, 365377.[CrossRef]
Ewing, B., Hillier, L., Wendl, M., and Green, P. (1998). Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175185.
Gheysen, G., Villarroel, R., and Van Montagu, M. (1991). Illegitimate recombination in plants: A model for T-DNA integration. Genes Dev. 5, 287297. Grevelding, C., Fantes, V., Kemper, E., Schell, J., and Masterson, R. (1993). Single-copy T-DNA insertions in Arabidopsis are the predominant form of integration in root-derived transgenics, whereas multiple insertions are found in leaf discs. Plant Mol. Biol. 23, 847860.[CrossRef][ISI][Medline] Jones, J., Gilbert, D., Grady, K., and Jorgensen, R. (1987). T-DNA structure and gene expression in petunia plants transformed by Agrobacterium tumefaciens C58 derivatives. Mol. Gen. Genet. 207, 478485.[CrossRef] Jorgensen, R., Snyder, C., and Jones, J. (1987). T-DNA is organized predominantly in inverted repeat structures in plants transformed with Agrobacterium tumefaciens C58 derivatives. Mol. Gen. Genet. 207, 471477.[CrossRef] Krizkova, L., and Hrouda, M. (1998). Direct repeats of T-DNA integrated in tobacco chromosome: Characterization of junction regions. Plant J. 16, 673680.[CrossRef][ISI][Medline]
Krysan, P.J., Young, J.C., and Sussman, M.R. (1999). T-DNA as an insertional mutagen in Arabidopsis. Plant Cell 11, 22832290.
Krysan, P.J., Young, J.C., Tax, F., and Sussman, M.R. (1996). Identification of transferred DNA insertions within Arabidopsis genes involved in signal transduction and ion transport. Proc. Natl. Acad. Sci. USA 93, 81458150. Liu, Y.G., Mitsukawa, N., Oosumi, T., and Whittier, R.F. (1995). Efficient isolation and mapping of Arabidopsis thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR. Plant J. 8, 457463.[CrossRef][ISI][Medline] Liu, Y.G., and Whittier, R.F. (1995). Thermal asymmetric interlaced PCR: Automatable amplification and sequencing of insert end fragments from P1 and YAC clones for chromosome walking. Genomics 25, 674681.[CrossRef][ISI][Medline] Martineau, B., Voelker, T., and Sanders, R. (1994). On defining T-DNA. Plant Cell 6, 10321033.[CrossRef][ISI][Medline] Mayerhofer, R., Koncz-Kalman, Z., Nawrath, C., Bakkeren, G., Crameri, A., Angelis, K., Redei, G., Schell, J., Hohn, B., and Koncz, C. (1991). T-DNA integration: A mode of illegitimate recombination in plants. EMBO J. 10, 697704.[ISI][Medline]
McElver, J., et al. (2001). Insertional mutagenesis of genes required for seed development in Arabidopsis thaliana. Genetics 159, 17511763.
Nawrath, C., Heck, S., Parinthawong, N., and Metraux, J.-P. (2002). EDS5, an essential component of salicylic aciddependent signaling for disease resistance in Arabidopsis, is a member of the MATE transporter family. Plant Cell 14, 275286.
Paabo, S., Irwin, D.M., and Wilson, A.C. (1990). DNA damage promotes jumping between templates during enzymatic amplification. J. Biol. Chem. 265, 47184721.
Parinov, S., Sevugan, M., Ye, D., Yang, W., Kumaran, M., and Sundaresan, V. (1999). Analysis of flanking sequences from Dissociation insertion lines: A database for reverse genetics in Arabidopsis. Plant Cell 11, 22632270. Parinov, S., and Sundaresan, V. (2000). Functional genomics in Arabidopsis: Large-scale insertional mutagenesis complements the genome sequencing project. Curr. Opin. Biotechnol. 11, 157161.[CrossRef][ISI][Medline]
Patel, R., Lin, C., Laney, M., Kurn, N., Rose, S., and Ullman, E.F. (1996). Formation of chimeric DNA primer extension products by template switching onto an annealed downstream oligonucleotide. Proc. Natl. Acad. Sci. USA 93, 29692974.
Preuss, D., Rhee, S.Y., and Davis, R.W. (1994). Tetrad analysis possible in Arabidopsis with mutation of the QUARTET (QRT) genes. Science 264, 14581460. Ramanathan, V., and Veluthambi, K. (1995). Transfer of non-T-DNA portions of the Agrobacterium tumefaciens Ti plasmid pTiA6 from the left terminus of the Tl-DNA. Plant Mol. Biol. 28, 11491154.[CrossRef][ISI][Medline]
Speulman, E., Metz, P., van Arkel, G., te Lintel Hekkert, B., Steikema, W.J., and Pereira, A. (1999). A two component Enhancer-Inhibitor transposon mutagenesis system for functional analysis of the Arabidopsis genome. Plant Cell 11, 18531866.
Sussman, M.R., Amasino, R.M., Young, J.C., Krysan, P.J., and Austin-Phillips, S. (2000). The Arabidopsis knockout facility at the University of Wisconsin-Madison. Plant Physiol. 124, 14651467.
Tissier, A.F., Marillonnet, S., Klimyuk, V., Patel, K., Torres, M.A., Murphy, G., and Jones, J.D. (1999). Multiple independent defective Suppressor-mutator transposon insertions in Arabidopsis: A tool for functional genomics. Plant Cell 11, 18411852. van der Graaff, E., Dulk-Ras, A., and Hooykaas, P. (1996). Deviating T-DNA transfer from Agrobacterium tumefaciens to plants. Plant Mol. Biol. 31, 677681.[CrossRef][ISI][Medline]
Winkler, R., Frank, M., Galbraith, D., Feyereisen, R., and Feldmann, K. (1998). Systematic reverse genetics of transfer-DNA-tagged lines of Arabidopsis. Plant Physiol. 118, 743750. Zambryski, P., Depicker, A., Kruger, K., and Goodman, H. (1982). Tumor induction of Agrobacterium tumefaciens: Analysis of the boundaries of T-DNA. J. Mol. Appl. Genet. 1, 361370.[Medline] This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||