|
|
||||||||
|
The Plant Cell 18:1100-1104 (2006) © 2006 American Society of Plant Biologists
Comparative Sequencing of Plant Genomes: Choices to MakeDepartment of Agronomy, Purdue University, West Lafayette, IN 47907
Broad Institute of MIT and Harvard, Cambridge, MA 02141
Department of Genetics, North Carolina State University, Raleigh, NC 27695 michaelp{at}unity.ncsu.edu
The first sequenced genome of a plant, Arabidopsis thaliana, was published <6 years ago (Arabidopsis Genome Initiative, 2000 WHAT SPECIES DO WE SEQUENCE? In a world with >260,000 known plant species and finite sequencing resources, it is crucial that we make careful choices as to which genomes will be sequenced. It is clear that the selection of sequencing targets will critically affect what we can learn from comparative plant genome sequencing. There are three areas that we feel will benefit substantially from future genome sequencing efforts. First, comparative genome sequences present opportunities to study the evolution of plant genome structure and the dynamics of molecular evolutionary processes. Second, they offer an approach to identify genes and other functional elements and provide critical data for annotation of completed plant genomes. Third, plant genome sequences provide the community with an important tool to pursue gene isolation in new target species. Given these scientific benefits, several key aspects of species choice should be considered.
Phylogenetic Placement of Target Species
We feel that one goal of plant genome sequencing efforts should be to select genomes in a more systematic fashion with respect to the relationships of plant groups. Ideally, the number and distribution of sequenced genomes should be optimized to allow investigators to examine the evolution of genome structure and function within the context of a robust, well-defined phylogeny. Moreover, it would be desirable to have genomes sequenced in sufficient numbers to allow investigators working on any plant species to have access to information from the sequenced genome of an evolutionarily proximate species.
We can calculate the number of plant taxon lineages as a function of divergence times (Figure 2
) based on a comprehensive phylogeny of 374 plant families whose branch points have been dated through molecular clock methods (Davies et al., 2004
Even smaller numbers of phylogenetically well-chosen species would have greater impact on comparative plant genomic studies than haphazard sampling across plant groups. For example, one can start by sampling species that belong to supraordinal groups that are yet to be covered by sequencing projects, such as the euasterid II group or other asterids (Figure 1). Sequencing basal angiosperm species as well as monocot species outside of the commelinids would also be appealing. Representatives of other land plant groups, including ferns, gymnosperms, and other bryophyte groups, such as the liverworts, will also be valuable in evolutionary comparisons. Moreover, sequence information from green algal groups that are sister to the land plants, the Charales and Coleochaetales, may help illuminate the evolutionary process that led to the transition of plants onto land.
Closely Related Species
There are already several efforts underway along these lines. The sequencing of the Arabidopsis lyrata and Capsella rubella genomes, as well as the Brassica oleracea and Brassica rapa genomes, provides good coverage in the Brassicaceae. Other groups will also benefit from sequencing of closely related species. The most obvious is the relatives of O. sativa (rice) in the genus Oryza, including the 12 species for which physical maps and BAC library resources have already been developed and for which genome sizes are relatively modest (Ammiraju et al., 2006
Resequencing Old Genomes
With this in mind, resequencing efforts of 20 individuals are already underway for A. thaliana and O. sativa using chip sequencing technology. Resequencing efforts will add value to the reference sequence by uncovering the wealth of genomic variation. Even resequencing spaced genes or gene fragments, which has been done in A. thaliana at the whole-genome level (Nordborg et al., 2005
Economic Considerations First, genomes of non-crop species can have an enormous impact on the plant scientific community, thus indirectly benefitting crop plant research. For example, the A. thaliana genome sequence has been invaluable in plant molecular genetic and developmental studies. Second, wild relatives of crops are a well-known source of allelic variation for desirable agronomic traits. Thus, there is an opportunity to exploit genes not only from single crops but also from species complexes that include both crop plants and close wild relatives. Sequencing wild relatives of crop species would afford a unique ability to dissect the process of crop domestication, which is probably the most significant technological innovation in human history. The availability of genome sequences for wild relatives of crop species alsowould provide a resource for ecological and evolutionary researchers interested in natural (as opposed to agricultural) plant systems.
Finally, crop species under consideration for future sequencing efforts can be expanded beyond the traditional commodity crops. There are 6000 plant taxa that are considered crops by various cultures, and a large number of these are found in developing countries. While most of these species may not be commodity crops, a large number of people depend on these alternative crops for food and other necessities, particularly in resource-poor environments. An example is cassava, which feeds HOW DO WE SEQUENCE? Beyond species selection, future genome projects must also select strategies and technologies that are appropriate for their goals and budgets. Greater choices in sequencing technologies will play a major role in enabling progress in comparative plant sequencing. While extremely successful, Sanger sequencing does present some significant challenges for obtaining large increases in throughput and speed beyond those seen over the last decade. It seems unlikely with current sequencing technologies that many more plant genomes will be sequenced to the level of completeness found in the O. sativa and A. thaliana genomes, although we should not discount the rapid advances in sequencing technologies that may allow for facile generation of assembled, completed genome sequences. This should not necessarily be a problem, since the incremental benefit between a draft and complete genome sequence lessens as more genomes are sequenced.
Several new sequencing technologies are emerging that have the potential to provide increases in throughput and reductions in cost (Metzker, 2005 Currently, the major issues these technologies face are short read lengths, inability to generate paired-end sequence reads, and uncertain quality metrics; the former two problems are serious limitations in generating a completely assembled sequence for large genomes. Combining traditional sequencing approaches with one of these new technologies can potentially offer a valuable middle ground. By having sufficient paired-end reads of cloned genomic fragments using Sanger sequencing, a sparse scaffold of the genome could be created that could subsequently be filled in with deeper coverage from short, unpaired reads. Moreover, some of the concerns of these present-day new technologies will fade as the technologies mature. There are also certain applications where these new approaches truly excel, such as in resequencing a previously sequenced reference genome (e.g., sequencing different rice varieties or sequencing several closely related Arabidopsis species). We should not forget that comparative projects require a previously completed reference genome and that with each new plant genome that is fully sequenced (whether finished quality as in Arabidopsis or rice, or draft quality as in poplar), the opportunities for comparative analyses increase. While Arabidopsis, rice, and poplar provide a great starting point, the careful selection of future fully sequenced genomes is critical. These may turn out to be large, complex genomes that are unsuitable for the new technologies, but committing to each one will continue to increase the number of foci around which comparative analyses can cluster. STARTING THE DEBATE We have outlined possible criteria that we feel should drive the choice of target genomes to be sequenced. It is not our intention to be comprehensive in this discussion, but rather to provide a framework for the community to begin thoughtful debate on what needs to be done. The list of candidate species will no doubt be long, and the resources will be limited. Understanding how we can intelligently make these choices is vital if the field is to move forward and the promise of plant genomics is to be fulfilled. Acknowledgments We thank Pam Soltis and Richard Michelmore for interesting and provoking discussions and Jennifer Modliszewski for technical assistance. This commentary is largely based on discussions at an informal workshop on comparative plant genome sequencing held at the Plant Genetics 2005 meeting sponsored by the American Society for Plant Biologists in October, 2005. However, we are solely responsible for the views expressed in this commentary. This work was funded in part by grants from the National Science Foundation Plant Genome Research Program to S.J. and M.P. and by the National Science Foundation Frontiers in Integrative Biological Research and Department of Defense to M.P. REFERENCES Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P.; International HapMapConsortium. (2005). A haplotype map of the human genome. Nature 437, 12991320.[CrossRef][Medline] Ammiraju, J.S., et al. (2006). The Oryza bacterial artificial chromosome library resource: Construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res. 16, 140147. Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815.[CrossRef][Medline] Bennett, S.T., Barnes, C., Cox, A., Davies, L., and Brown, C. (2005). Toward the $1000 human genome. Pharmacogenomics 6, 373382.[Medline] Chase, M.W. (2004). Monocot relationships: An overview. Am. J. Bot. 91, 16451655. Cork, J.M., and Purugganan, M.D. (2005). High-diversity genes in the Arabidopsis genome. Genetics 170, 18971911. Davies, T.J., Barraclough, T.G., Chase, M.W., Soltis, P.S., Soltis, D.E., and Savolainen, V. (2004). Darwin's abominable mystery: Insights from a supertree of the angiosperms. Proc. Natl. Acad. Sci. USA 101, 19041909. Goff, S.A., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92100. International Rice Genome Sequencing Project (2005). The map-based sequence of the rice genome. Nature 436, 793800.[CrossRef][Medline] Margulies, M., et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 326327.[CrossRef][Medline] Metzker, M. (2005). Emerging technologies in DNA sequencing. Genome Res. 15, 17671776. Nickrent, D.L., Parkinson, C.L., Palmer, J.D., and Duff, R.J. (2000). Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. Mol. Biol. Evol. 17, 18851895. Nordborg, M., et al. (2005). The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, 12891299. Raven, P., Fauquet, C., Swaminathan, M.S., Borlaug, N., and Samper, C. (2005). Where next for genome sequencing? Science 311, 468.[CrossRef] Soltis, P.S., and Soltis, D.E. (2004). The origin and diversification of angiosperms. Am. J. Bot. 91, 16141626. Yu, J., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 7992. This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | THE PLANT CELL | PLANT PHYSIOLOGY | |
|---|---|---|---|