|
|
||||||||
|
First published online April 10, 2003; 10.1105/tpc.010181 American Society of Plant Biologists Conserved Noncoding Sequences among Cultivated Cereal Genomes Identify Candidate Regulatory Sequence Elements and Patterns of Promoter EvolutionDepartment of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 1 To whom correspondence should be addressed. E-mail smoose{at}uiuc.edu; fax 217-333-4582
Surveys for conserved noncoding sequences (CNS) among genes from monocot cereal species were conducted to assess the general properties of CNS in grass genomes and their correlation with known promoter regulatory elements. Initial comparisons of 11 orthologous maize-rice gene pairs found that previously defined regulatory motifs could be identified within short CNS but could not be distinguished reliably from random sequence matches. Among the different phylogenetic footprinting algorithms tested, the VISTA tool yielded the most informative alignments of noncoding sequence. VISTA was used to survey for CNS among all publicly available genomic sequences from maize, rice, wheat, barley, and sorghum, representing >300 gene comparisons. Comparisons of orthologous maize-rice and maize-sorghum gene pairs identified 20 bp as a minimal length criterion for a significant CNS among grass genes, with few such CNS found to be conserved across rice, maize, sorghum, and barley. The frequency and length of cereal CNS as well as nucleotide substitution rates within CNS were consistent with the known phylogenetic distances among the species compared. The implications of these findings for the evolution of cereal gene promoter sequences and the utility of using the nearly completed rice genome sequence to predict candidate regulatory elements in other cereal genes by phylogenetic footprinting are discussed.
Comparative analyses of noncoding DNA sequences, also known as phylogenetic footprinting (Tagle et al., 1988
The power of phylogenetic footprinting is enhanced significantly when genomic sequences are available from a number of related species that have diverged sufficiently so that conserved functional elements can be distinguished from nonconserved sequences (Dubchak et al., 2000 Despite the relatively low overall sequence identity among promoter sequences from cereal genes compared with their coding regions, numerous expression analyses and transgenic experiments involving promoter-reporter gene constructs have demonstrated that orthologous genes from different cereal species often share highly regulated patterns of gene expression. Thus, these conserved patterns of gene expression might be programmed by regulatory sequences associated with CNS. We tested this hypothesis by surveying for CNS among orthologous genes from maize, rice, barely, wheat, and sorghum. The genomic sequences from 652 genes of these species were gleaned from public databases, annotated, and compared with orthologous gene sets. Using this data set, we performed analyses to determine the general properties of CNS among cereal genes, particularly their frequency, length, and distribution within genes and across cereal taxa. We found that CNS can be identified in the vast majority of comparisons involving orthologous cereal genes, but their frequency and length are less compared with CNS in other species. For the 31 known regulatory motifs examined, 28 were identified within CNS by one or more of the phylogenetic footprinting algorithms tested, but only under relaxed definitions for CNS length that did not reliably distinguish regulatory elements above background noise. Thus, although phylogenetic footprinting comparisons can predict functional regulatory elements within cereal promoter sequences, their relatively short length limits the utility of this approach to identify new unknown motifs, particularly when sequences from only two species are compared. Together, these analyses suggest that the promoter sequences among cereal genomes are evolving rapidly and show little sequence conservation even among orthologous genes. We also discuss the prospects for phylogenetic footprinting approaches with the rice genome sequence to help predict promoter elements that define regulatory networks of gene expression in maize and other grass species.
Compilation of Genomic Sequences from Orthologous Cereal Genes A prerequisite for evaluating phylogenetic footprinting between maize and other grasses on a genome scale is the availability of large numbers of genomic sequences from grass species. Rice genome sequencing projects have generated large amounts of sequence available for comparison, but promoter sequences from other grass genomes are limiting. Among other cereals, searches of GenBank identified 652 annotated genomic sequences from maize, barley, wheat, and sorghum that contained at least 300 bp of 5' upstream noncoding sequences, representing 2.165 Mbp. For gene families with many highly related members, such as those that encode seed storage proteins, only one representative sequence was included in the data set. The list of annotated maize genomic sequences can be found in the supplemental data at http://www.cropsci.uiuc.edu/faculty/moose/PromoterComparisons.htm and is summarized in Table 1.
The largest group of genomic sequences was from maize and represented 1.12 Mbp. Approximately 40% of the maize, barley, and wheat sequences were annotated previously for promoter regions, which totaled 291 kb from 288 genes. The vast majority of sorghum genomic sequences were obtained from recently sequenced BAC clones and thus were not annotated for transcription start sites. Within BAC clones, the 5' boundary of upstream sequences was defined as either 3000 bp 5' to the start codon or the 3' end of the closest upstream gene or repetitive DNA sequence (e.g., retrotransposons). The entire set of 652 genes contained on average 3958 bp of genomic sequence and 1441 bp of 5' upstream sequence. For those genes with annotated promoters, the average promoter length was 1450 bp. The maize genomic sequences gleaned from GenBank entries were used in sequence similarity searches to identify putative orthologous genes from publicly available rice genome sequences (see Methods). From these searches, 78 likely pairs of orthologous maize and rice genes were identified based on high nucleotide identity (>80%) throughout their coding regions and additional criteria, such as expression and proposed gene function based on mutant analyses or biochemical assays. A complete listing of these likely orthologous gene sets is available as supplemental data at http://www.cropsci.uiuc.edu/faculty/moose/PromoterComparisons.htm. The set of orthologous gene pairs is representative of most classes of gene functions, including housekeeping proteins, structural genes, enzymes that participate in specific metabolic or physiological processes such as starch synthesis, and genes that encode photosynthetic proteins. The average nucleotide identity within coding regions was 90.1% between maize and rice orthologous gene pairs. The total sequence space in the orthologous gene pair data set was 103 kb for maize and 150 kb for rice, with 73.1 and 98.8 kb representing 5' upstream sequences from maize and rice, respectively.
Evaluation of Phylogenetic Footprinting for Promoter Element Identification in Maize and Rice
Each of these algorithms is designed to rapidly align long genomic sequences, but via different approaches (Frazer et al., 2003
The orthologous maize and rice gene pairs were compared for CNS using each of the BL2SEQ, PipMaker, DBA, DIALIGN, BBA, and VISTA algorithms. Similar to the findings reported by Kaplinsky et al. (2002)
The six phylogenetic footprinting algorithms tested performed differently in their ability to predict known promoter regulatory motifs. PipMaker and DBA detected <10 of the 31 elements under any of the evaluated length parameters (data not shown). Under the relaxed criteria to define CNS, approximately half of the known promoter motifs were found within CNS identified by BL2SEQ, DIALIGN, or BBA, but these tools also produced a relatively high background in which >36% of all nucleotides within the promoter sequences were defined as CNS. The CNS identified by VISTA contained 19 of the 30 promoter elements, but only 12% of the total promoter sequences were contained within CNS. Thus, VISTA appeared to offer the best combination of predictive success for known regulatory elements (63.3%) within the smallest fraction of promoter sequence defined as CNS.
Figure 1B shows the nucleotide alignments of the CNS identified by VISTA for six maize-rice orthologous gene pairs in the regions surrounding 11 of the known regulatory sequence elements. What is striking in these alignments is that the CNS often localize over the regulatory sequence elements, with the flanking DNA sequences diverging sufficiently to fail to be defined as CNS. This is clearly evident for CNS observed in the maize Sbe1 promoter, in which each of the three positive regulatory sequence elements defined by Kim and Guiltinan (1999)
Surveys for CNS in Orthologous Maize and Rice Genes
The frequency of CNS blocks among the 78 gene pairs ranged from 1 to 31 per 1000 bp of noncoding sequence compared, with the mean being 11.98 blocks per 1000 bp. The total fraction of noncoding sequence contained within CNS ranged from 1 to 34%, with a mean of 15.2%. The mean block length for CNS identified between maize and rice promoter sequences was 11.9 bp, with 88 bp being the largest CNS in this data set. The vast majority (67%) of maize-rice CNS are between 8 and 11 bp, which is consistent with the observation that most of these CNS are not identified with larger window sizes. Among the 78 gene pairs compared, the five genes with the highest proportions of CNS (>30%) relative to the entire data set (ferritin1, alcohol dehydrogenase2, hsp70, proliferating cell nuclear antigen, and sbe1) tended to represent genes that perform housekeeping or metabolic functions and whose expression patterns might be expected to be more highly conserved between maize and rice. Conversely, five gene pairs showed <3% CNS (nitrate reductase1, rf2 and rf2B aldehyde dehydrogenases, sugary1, and Mha1 proton-transporting ATPase). These genes could represent gene pairs whose noncoding sequences have diverged more rapidly than those of most other genes. Alternatively, they may represent paralogous rather than orthologous gene pairs whose noncoding sequences have diverged. CNS were not distributed equally among different genic regions (Table 3). CNS at least 10 bp in length were most prevalent in the 5' untranslated leader sequences of mRNAs upstream of the start codon, followed by transcribed sequences downstream of stop codons and then intron sequences. CNS were least frequent (8.7%) in promoter sequences upstream of defined or predicted transcription start sites. Within each of these noncoding sequences, CNS tend to be found more often closest to coding sequences. This distribution may reflect the strong selection pressure on exonic coding sequences, which indirectly preserves adjacent untranslated sequences, or could indicate functional conservation of sequences surrounding transcription start sites, exon-intron splice junctions, or polyadenylation signals. The frequency of CNS within promoter sequences decreases as distance 5' to the transcription start site or start codon increases. Very few CNS are identified >1000 bp upstream of transcription start sites. This distribution is consistent with the compact nature of plant gene promoters, in which <1000 bp of promoter sequence often is sufficient to drive proper regulated patterns of transcription and regulatory elements tend to be clustered near the transcription start site.
Maize-Rice CNS Are Not Significantly Enriched for Known Regulatory Elements
One test for the regulatory significance of short CNS identified in maize-rice gene comparisons is to determine whether known functional promoter regulatory elements appear at a higher frequency in CNS compared with the entire maize and rice promoter sequence data set, which has been observed in comparisons of orthologous human and mouse genes (Levy et al., 2001
The heptad sequences within maize and rice gene promoters showed a relatively normal distribution, with an observed mean identical to that expected (1/47 x 106 bp = 61 per Mbp). However, this distribution was biased by a small group of 300 elements that appeared with very high frequencies, which shifted the median peak occurrence frequency in the cereal promoter sequence data set to 51 per Mbp. Only 3283 of the possible 16,384 heptad sequences appeared in the smaller VISTA CNS data set; thus, the occurrence frequencies for individual heptad repeats were inflated relative to the total cereal promoter sequence data set, preventing a direct comparison of these two distributions. However, it was possible to compare whether the distribution of the 3283 heptad sequences common to the two data sets differed from that of the total cereal promoter sequence data set. The log likelihood goodness-of-fit test (G = 865.11, P < 0.001) showed that the distribution of heptad sequences identified by VISTA as CNS was not the same as that of the total data set, with the most common heptad sequences from the total promoter data set occurring at even higher frequencies within CNS (Figure 2). The 5'-AAAAAAA-3' heptad and its complement were the two most frequent heptads in both data sets, with similar AT-rich and CT-rich heptads also being very common. The high occurrence frequencies of AT-rich sequences in maize and rice promoters and their enrichment in maize-rice CNS are consistent with the recent observation that human-mouse CNS also are enriched significantly in AT-rich sequences associated with nuclear matrix attachment regions (Glazko et al., 2003The occurrence frequencies for each of the 19 elements identified by VISTA in Table 2 were compared between the total cereal promoter and VISTA CNS data sets. For elements >7 bp, each 7-bp window within the total element was compared, and the results were averaged to derive a mean frequency for the whole element. Only 5 of the 19 elements (Sbe1 -284 to -255, rab28 ABRE, rab28 DRE, Fer1 IDRS, and Fer1 G-box) showed a greater than twofold increase in their occurrences per megabase pair for the CNS data set compared with the total promoter sequence data set. Thus, although the distribution of heptad sequences differed between CNS and the total promoter data set, these differences do not appear to be attributable to an enrichment of the 19 known regulatory elements considered here.
Twenty Base Pairs Is the Minimum Length for a Significant CNS among Orthologous Cereal Genes
To assess whether or not the distribution of CNS block lengths identified in comparisons of orthologous maize-rice gene pairs was significantly different from that of sequence blocks observed in random comparisons, a sequence data set was constructed in which the promoter sequences from the 50 maize genes were fused to the maize sbe1 coding region and the 50 rice gene promoters were fused to the rice sbe1 coding region. These gene constructs then were compared in all possible pair-wise combinations using VISTA, and all noncoding sequence blocks that fit the same criteria as CNS (>70% identity in a 10-bp window) were included in the random comparison data set. The distribution of the sequence block lengths from the random maize-rice gene comparisons (Figure 3A, white bars) was essentially the same as that observed for the orthologous gene comparisons, except that the random comparisons did not contain any sequence blocks of >20 bp. Thus, for maize-rice gene comparisons using VISTA, only those CNS of >20 bp showed frequencies that were higher than would be expected at random.
The frequency distribution of CNS lengths observed between maize and rice genes was only slightly different from what might be expected at random, suggesting that the evolutionary distance between maize and rice is on average too great to identify CNS. Thus, sorghum was investigated as an alternative to compare with maize sequences because it diverged from maize
CNS in Genes from Three or More Cereal Species
The inclusion of promoter sequences from additional species at intermediate levels of evolutionary divergence often improves the identification of functionally important sites by phylogenetic footprinting (Boffelli et al., 2003
Five Adh1 CNS were conserved across all four of the cereal species (Figure 4B). Two of these CNS reside in the promoters of these genes, the most distal of which contains sequences similar to the DRE element and G-box motifs known to interact with DNA binding proteins (Menkens et al., 1995
Inspection of the outputs from the 17 other orthologous gene sets for which three-way species comparisons were possible (Table 4) identified few CNS in the promoter regions of genes from all three species. However, Figure 4C shows that four CNS of 68, 50, 41, and 32 bp were found for the maize, rice, and sorghum genes encoding the large subunit of ADP-glucose pyrophosphorylase (maize Shrunken2 [Sh2]). Each of these promoter CNS contains sequences with similarity to at least one known promoter regulatory element. Maize Sh2 and its rice ortholog are expressed in seed endosperm tissue (Bhave et al., 1990
It is evident from visual inspection of the alignments that the maize and sorghum sequences show the highest degree of conservation, with rice and barley being somewhat divergent from each other as well as from the maize and sorghum sequences. These relationships are consistent with the known phylogeny of these grasses. Figure 4D provides estimates of nucleotide substitution rates based on the Adh1 and Sh2 CNS alignments. The estimates for nonsynonymous and synonymous substitution rates within the Adh1 coding sequence are very close to the mean estimates of Gaut et al. (1996)
We have surveyed for CNS from a large number of genes from the major cereal crop species to determine the general properties and distribution of CNS among grass genomes. Our findings have practical implications for comparative genomics within this important plant family, suggest strategies and limitations for the identification of promoter regulatory elements in cereal genes by phylogenetic footprinting, and provide insights into the evolution of regulatory sequences in the grasses.
An Annotated Database for Cereal Gene Promoter Sequences
Strong candidate orthologous rice genes were identified for only 78 of the 288 maize genes tested (27.1%), which is much less than the estimated coverage of phase-2 rice genome sequences in GenBank (currently
Among the available sequence comparisons cited above, two examples exist in which CNS provides supporting evidence for orthology. Chen et al. (1998)
Utility of Phylogenetic Footprinting to Predict Regulatory Motifs in Cereal Gene Promoters
VISTA was clearly superior to the other alignment methods used, at least with this set of genes. The three other algorithms all share the property of discarding short blocks, which constitute the vast majority of the CNS contained in this data set. VISTA also exhibits biases, because it tends to omit small blocks that flank longer, strongly conserved blocks or to omit small blocks that lie between two larger blocks. The net result of these biases is to subdivide larger CNS blocks into multiple smaller blocks, which is one contributing factor to the high frequency of small CNS blocks identified by VISTA. It is important to note that each of the regulatory elements identified by the different tools evaluated here (Figure 1) were parts of regulatory "modules" containing more than one adjacent transcription factor binding site. Levy et al. (2001) Our primary goal in surveying for CNS among a large number of cereal genes was to develop criteria with which to assess their biological significance, particularly for comparisons with the nearly completed rice genome. We found for maize-rice comparisons using VISTA that the majority of CNS are between 8 and 15 bp (Figure 3), which offers potential advantages in precisely defining candidate regulatory sequences. However, our analyses of the occurrence frequencies for all possible combinations of heptad sequences in the CNS compared with total promoter sequence data sets did not provide any evidence that CNS are enriched significantly in known regulatory elements. Thus, the short CNS identified among cereal gene promoters are unlikely to be useful in identifying functionally significant promoter regulatory sequences.
CNS among grass genomes are much smaller than those observed in mammals (Kaplinsky et al., 2002
The small size of the CNS identified in the maize-rice comparisons raises the possibility that they represent background noise generated by the alignment tool. Indeed, we found that the frequency distributions of CNS for orthologous maize-rice gene comparisons did not deviate significantly from those obtained in the random comparisons (Figure 3), whereas those from the orthologous maize-sorghum comparisons did differ from random. In both sets, only blocks of >20 bp were found at frequencies greater than those observed in the random gene comparison data set; hence, 20 bp would seem to be a lower boundary by which to define CNS among grass genomes. In their comparisons of two promoter sequences from 22 cruciferous species spanning 45 million years of evolution, Koch et al. (2001)
A second criterion that we used to assess the significance of CNS was to determine if they could be identified in comparisons of orthologous genes from three or more cereal species. This analysis could be performed for only 18 gene sets, but the results demonstrate that this strategy greatly reduced the number of CNS identified. Of course, such a criterion is biased for regulatory elements that contribute to highly conserved patterns of gene expression among the grasses. Interestingly, maize, sorghum, barley, and wheat each showed <10% CNS compared with rice, suggesting that the problem of background noise found in the maize-rice comparisons also applies when using the rice genome for comparisons with other economically important cereals. However, the Adh1 and Sh2 examples (Figure 4) as well as the results of Kaplinsky et al. (2002)
Together, our results suggest that stringent criteria for defining CNS in the grasses would be a length of at least 20 bp and the presence of the CNS in at least one additional grass species with an intermediate level of divergence, such as rice-sorghum-maize. Although additional comparisons need to be made to better assess the utility of phylogenetic footprinting for a larger set of cereal genes, the low frequencies of CNS observed here between the rice, barley-wheat, and sorghum-maize clades (Table 4) suggest that shorter evolutionary distances may be required for effective promoter comparisons. Thus, as has been demonstrated recently for primate genomes (Boffelli et al., 2003
CNS and the Evolution of Cereal Gene Promoters
Song et al. (2002)
Our surveys for CNS among orthologous cereal genes generally found low proportions of CNS within promoter regions. A simple interpretation of this observation is that expression patterns have diverged significantly among apparently orthologous genes in the grasses. Certainly, this will be the case for some genes, especially those that have been subjected to one or more rounds of gene duplication. However, studies that have analyzed endogenous expression patterns and characterized promoter-reporter constructs in transgenic cereals have found that regulated patterns of gene expression often are highly conserved across different cereal species. Therefore, the few CNS that are identified among cereal gene promoters may be the primary sequence determinants for conserved patterns of gene expression. For example, a short region of the maize Adh1 promoter confers proper gene expression in transgenic rice (Kyozuka et al., 1994 Certainly, the regulation of gene expression in the grasses and other plant species is complex. The use of comparative genomics approaches that reveal patterns of noncoding sequence evolution, particularly in promoter regions, offers important insights into this problem. Promoter CNS often are likely to correspond to transcription factor binding sites, but they also could mediate sites of interaction with chromatin complexes that influence gene expression. The strategies described here for identifying CNS among grass genes can be applied readily to the analysis of other plant genes, with CNS helping guide laboratory experiments to determine their functional roles in the regulation of gene expression. We expect that as the amount of plant genomic sequence information expands to include more species and taxa, appropriate comparisons of noncoding sequences that consider the significance tests presented here will become an increasingly powerful approach to elucidating mechanisms of plant gene regulation.
Compilation of Genomic Sequences from Orthologous Cereal Genes The Entrez nucleotide database from GenBank (http://www.ncbi.nlm.nih.gov) was searched for all maize (Zea mays) sequence entries by setting the following limits parameters: genomic DNA/RNA sequences, exclude ESTs, exclude sequence tagged sites, exclude genomic survey sequences, but include patents. Approximately 5000 sequence entries were returned that were inspected manually for the presence and annotation of genomic DNA sequences with features such as transcription start sites, the TATA-box promoter element, coding sequence, intron-exon boundaries, and poly(A) signals. Among sequences contained in patent applications, genomic annotations frequently were absent but often could be ascertained by comparisons of genomic and cDNA sequences for the same gene when available. Because many sequences contained annotation only of the start codon, only those sequences that contained at least 200 bp of 5' upstream sequence were included in the final database. This process was repeated in a search for all sorghum (Sorghum bicolor), wheat (Triticum aestivum), and barley (Hordeum vulgare) genomic sequence entries that contained at least 200 bp of 5' upstream sequence. When multiple sequences from the same gene were available, only the longest sequence was included, and preference also was given to sequences from wild-type functional alleles rather than mutant alleles. All entries that represented unique gene sequences were downloaded as annotated sequence files into a local database within the Vector NTI Suite 7.0 (Informax, Bethesda, MD) software. The final database, containing 652 annotated entries, then was imported into a Microsoft Access (Redmond, WA) relational database, including html links to National Center for Biotechnology Information GenBank entries, which can be viewed as supplemental data at http://www.cropsci.uiuc.edu/faculty/moose/PromoterComparisons.htm. The sequence entries in this database were used to extract putative orthologous gene sequences as follows. First, GenBank nonredundant and high-throughput genomic sequence databases were searched with coding region sequences using TBLASTX. The top maize, rice (Oryza sativa), barley, sorghum, or wheat hits for each sequence from these searches then were compared with the genomic DNA sequences using BLASTN. All searches were performed on a local server (Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign). Only those genomic sequences that showed probability scores of >1 x 10-5 and >80% nucleotide identity within target coding regions were considered to be candidate orthologous genes. Members of known multigene families that showed similarity only in highly conserved protein domains were excluded to minimize the chance of comparing paralogs rather than orthologs. Only those gene pairs that contained at least 200 bp of 5' upstream sequence were included in the final data set. The majority of rice and sorghum genomic sequences were identified within BAC or P1 artificial chromosome clones; thus, the genomic sequences containing conserved coding sequences as well as 3000 bp of 5' flanking sequence and 1000 bp of 3' flanking sequence were selected manually from the larger sequence and saved as annotated files in a Vector NTI Explorer local database. A listing of the orthologous genes identified in this manner is included as a table in the relational database housed at http://www.cropsci.uiuc.edu/faculty/moose/PromoterComparisons.htm.
Phylogenetic Footprinting Comparisons The surveys for CNS among multiple-species comparisons involved orthologous genes from maize, rice, sorghum, barley, and wheat. The gene sequences compared and their accession numbers are listed at the end of Methods.
Estimation of Heptad Frequency Distributions in Sequence Data Sets
Analyses of CNS The frequency distribution of block lengths from comparisons of orthologous maize-rice or maize-sorghum gene pairs (Figure 2) was determined using the Frequency function in Microsoft Excel. To generate a data set that assessed the frequency with which sequences that met the criteria for CNS (70% identity in a 10-bp window) occurred in random comparisons of noncoding sequences, each of the 50 maize upstream noncoding sequences was fused to the maize sbe1 coding region and each of the 50 rice upstream noncoding sequences was fused to the rice sbe1 coding region. Thus, each of these gene constructs contained the same pair of orthologous coding regions, which facilitated proper alignment of the entire sequence by VISTA. Each of the 50 maize constructs then was compared with the 49 nonorthologous rice constructs using VISTA and criteria of 70% identity in a 10-bp window. The VISTA output then was used to generate the frequency distribution for random gene comparisons, as described above for orthologous gene comparisons.
Estimation of Nucleotide Substitution Rates
The alignments of CNS from Adh1 and Sh2 (Figure 3) were pooled into single alignments of 137 bp (Adh1) and 192 bp (Sh2) and used as inputs into the baseml program within PAML. Nucleotide distance measures were estimated under the model of Tamura and Nei (1993)
Nucleotide substitution rates were estimated using these computed distance measures and the formula described by Gaut et al. (1996) Upon request, all novel materials described in this article will be made available in timely manner for noncommercial research purposes.
Accession Numbers
We thank the Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign for access to its batch BLAST server as well as for server space to perform phylogenetic footprinting comparisons. We thank Nick Lauter and an anonymous reviewer for helpful suggestions regarding the statistical evaluation of CNS. We also acknowledge access to draft rice genome sequences from Monsanto (rice-research.org) and Syngenta's Torrey Mesa Research Institute, which allowed confirmation of orthologous maize-rice gene sequences. Funding for this research was provided by the College of Agriculture and Consumer Economics at the University of Illinois and by grants to S.P.M. from the University of Illinois Research Board and the U.S. Department of Agriculture (Award 2002-35301-12157).
Online version contains Web-only data. Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010181. Received December 21, 2002; accepted March 7, 2003.
Anderson, J.M., Larsen, R., Laudencia, D., Kim, W.T., Morrow, D., Okita, T.W., and Preiss, J. (1991). Molecular characterization of the gene encoding a rice endosperm-specific ADP-glucose pyrophosphorylase subunit and its developmental pattern of transcription. Gene 97, 199205.[CrossRef][Web of Science][Medline] Avramova, Z., and Bennetzen, J.L. (1993). Isolation of matrices from maize leaf nuclei: Identification of a matrix-binding site adjacent to the Adh1 gene. Plant Mol. Biol. 22, 11351143.[Medline]
Bennetzen, J.L. (2000). Comparative sequence analysis of plant nuclear genomes: Microcolinearity and its many exceptions. Plant Cell 12, 10211030.
Bergman, C., and Kreitman, M. (2001). Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11, 13351345.
Bhave, M.R., Lawrence, S., Barton, C., and Hannah, L.C. (1990). Identification and molecular characterization of shrunken-2 cDNA clones of maize. Plant Cell 2, 581588.
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. (2003). Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 13911394. Brignon, P., and Chaubet, N. (1993). Constitutive and cell-division-inducible protein-DNA interactions in two maize histone gene promoters. Plant J. 4, 445457.[Medline] Buell, C.R. (2002). Obtaining the sequence of the rice genome and lessons learned along the way. Trends Plant Sci. 7, 521565.[Medline] Busk, P.K., Pujal, J., Jessop, A., Lumbreras, V., and Pages, M. (1999). Constitutive protein-DNA interactions on the abscisic acid-responsive element before and after developmental activation of the rab28 gene. Plant Mol. Biol. 41, 529536.[CrossRef][Medline]
Chandler, V.L., and Wessler, S. (2001). Grasses: A collective model genetic system. Plant Physiol. 125, 11551156.
Chen, M., SanMiguel, P., and Bennetzen, J.L. (1998). Sequence organization and conservation in sh2/a1-homologous regions of sorghum and rice. Genetics 148, 435443.
Clark, A.G. (2001). The search for meaning in noncoding DNA. Genome Res. 11, 13191320.
Cliften, P.F., Hillier, L.W., Fulton, L., Graves, T., Miner, T., Gish, W.R., Waterston, R.H., and Johnston, M. (2001). Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 11751186.
Colinas, J., Birnbaum, K., and Benfey, P.N. (2002). Using cauliflower to find conserved non-coding regions in Arabidopsis. Plant Physiol. 129, 451454.
Devos, K.M., and Gale, M.D. (2000). Genome relationships: The grass model in current research. Plant Cell 12, 637646.
Dubchak, I., Brudno, M., Loots, G.G., Mayor, C., Pachter, L., Rubin, E.M., and Frazer, K.A. (2000). Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 10, 13041306.
Ferl, R.J., and Nick, H.S. (1987). In vivo detection of regulatory factor binding sites in the 5' flanking region of maize Adh1. J. Biol. Chem. 262, 79477950.
Forde, B.G., Heyworth, A., Pywell, J., and Kreis, M. (1985). Nucleotide sequence of a B1 hordein gene and the identification of possible upstream regulatory elements in endosperm storage protein genes from barley, wheat, and maize. Nucleic Acids Res. 13, 73277339.
Frazer, K.A., Elnitski, L., Church, D.M., Dubchak, I., and Hardison, R.C. (2003). Cross-species sequence comparisons: A review of methods and available resources. Genome Res. 13, 112. Freeling, M., and Bennett, D.C. (1985). Maize Adh1. Annu. Rev. Genet. 19, 297323.[Web of Science][Medline]
Gaut, B.S., and Doebley, J.F. (1997). DNA sequence evidence for the segmental allotretraploid origin of maize. Proc. Natl. Acad. Sci. USA 94, 68096814.
Gaut, B.S., Morton, B.R., McCaig, B.C., and Clegg, M.T. (1996). Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear Adh1 gene parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93, 1027410279. Glazko, G.V., Konin, E.V., Rogozin, I.B., and Shabalina, S.A. (2003). A significant fraction of conserved noncoding DNA in human and mouse consists of predicted matrix attachment regions. Trends Genet. 19, 119124.[CrossRef][Web of Science][Medline]
Goff, S.A., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92100. Gruenbaum, Y., Naveh-Many, T., Cedar, H., and Razin, A. (1981). Sequence specificity of ethylation in higher plant DNA. Nature 292, 860862.[CrossRef][Medline] Guan, L.M., Zhao, J., and Scandalios, G. (2000). Cis-elements and trans-factors that regulate expression of the maize Cat1 antioxidant gene in response to ABA and osmotic stress: H2O2 is the likely intermediary signaling molecule for the response. Plant J. 22, 8795.[CrossRef][Web of Science][Medline]
Jareborg, N., Birney, E., and Durbin, R. (1999). Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 9, 815824.
Kaplinsky, N.J., Braun, D.M., Penterman, J., Goff, S.A., and Freeling, M. (2002). Utility and distribution of conserved noncoding sequences in the grasses. Proc. Natl. Acad. Sci. USA 99, 61476151.
Kellogg, E.A. (2001). Evolutionary history of the grasses. Plant Physiol. 125, 11981205.
Kim, K.N., and Guiltinan, M.J. (1999). Identification of cis-acting elements important for expression of the starch-branching enzyme I gene in maize endosperm. Plant Physiol. 121, 225236. Kizis, D., and Pages, M. (2002). Maize DRE binding proteins DBF1 and DBF2 are involved in rab17 regulation through the drought-responsive element in an ABA-dependent pathway. Plant J. 30, 679689.[CrossRef][Web of Science][Medline]
Koch, M.A., Weishaar, B., Kroyman, J., Haubold, B., and Mitchell-Olds, T. (2001). Comparative genomics and regulatory evolution: Conservation and function of the Chs and Apetala3 promoters. Mol. Biol. Evol. 18, 18821891. Kyozuka, J., Olive, M., Peacock, W.J., Dennis, E.S., and Shimamoto, K. (1994). Promoter elements required for developmental expression of the maize Adh1 gene in transgenic rice. Plant Cell 6, 799810.[Abstract]
Lescot, M., Déhais, P., Thijs, G., Marchal, K., Moreau, Y., Van de Peer, Y., Rouzé, P., and Rombauts, S. (2002). PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 30, 325327.
Lesnick, M.L., and Chandler, V.L. (1998). Activation of the maize anthocyanin gene a2 is mediated by an element conserved in many anthocyanin promoters. Plant Physiol. 117, 437445.
Levy, S., Hannennalli, S., and Workman, C. (2001). Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics 17, 871877. Li, W.-H., and Graur, D. (1991). Fundamentals of Molecular Evolution. (Sunderland, MA: Sinauer Associates).
Loots, G.G., Locksley, R.M., Blankenspoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. (2000). Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136140.
Martínez-Hernández, A., López-Ochoa, L., Argüello-Astorga, G., and Herrera-Estrella, L. (2002). Functional properties and regulatory complexity of a minimal RBCS light-responsive unit activated by phytochrome, cryptochrome, and plastid signals. Plant Physiol. 128, 12231233.
Mayor, C., Brudno, M., Schwartz, J.R., Poliakov, A., Rubin, E.M., Frazer, K.A., Pachter, L.S., and Dubchak, I. (2000). VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16, 10461047.
McCue, L.A., Thompson, W., Carmack, C.S., and Lawrence, C.E. (2002). Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12, 15231532. Menkens, A.E., Schindler, U., and Cashmore, A.R. (1995). The G-box: A ubiquitous regulatory DNA element in plants bound by the GBF family of bZIP proteins. Trends Biochem. Sci. 20, 506510.[CrossRef][Web of Science][Medline]
Morgenstern, B. (1999). DIALIGN 2: Improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211218.
Morishige, D.T., Childs, K.L., Moore, L.D., and Mullet, J.E. (2002). Targeted analysis of orthologous phytochrome A regions of the sorghum, maize, and rice genomes using comparative gene-island sequencing. Plant Physiol. 130, 16141625.
Okagaki, R.J., and Wessler, S.R. (1988). Comparison of mutant and non-mutant waxy genes in rice and maize. Genetics 120, 11371143.
Olive, M.R., Peacock, W.J., and Dennis, E.S. (1991). The anaerobic responsive element contains two GC-rich sequences essential for binding a nuclear protein and hypoxic activation of the maize Adh1 promoter. Nucleic Acids Res. 19, 70537060.
Paul, A.L., and Ferl, R.J. (1991). In vivo footprinting reveals unique cis-elements and different modes of hypoxic induction in maize Adh1 and Adh2. Plant Cell 3, 159168. Paul, A.L., and Ferl, R.J. (1994). In vivo footprinting identifies an activating element of the maize Adh2 promoter specific for root and vascular tissues. Plant J. 5, 523533.[Medline]
Petit, J.-M., van Wuytswinkel, O., Briat, J.-F., and Lobreaux, S. (2001). Characterization of an iron-dependent regulatory sequence involved in the transcriptional control of AtFer1 and ZmFer1 plant ferritin genes by iron. J. Biol. Chem. 276, 55845590.
Praz, V., Perier, R., Bonnard, C., and Bucher, P. (2002). The Eukaryotic Promoter Database, EPD: New entry types and links to gene expression data. Nucleic Acids Res. 30, 322324.
Quiros, C.F., Grellet, F., Sadowski, J., Suzuki, T., Li, G., and Wroblewski, T. (2001). Arabidopsis and Brassica comparative genomics: Sequence, structure and gene content in the ABI1-Rps2-Ck1 chromosomal segment and related regions. Genetics 157, 13211330.
Ramakrishna, W., Dubcovsky, J., Park, Y.-J., Busso, C., Emberton, J., SanMiguel, P., and Bennetzen, J.L. (2002a). Different types and rates of genome evolution detected by comparative sequence analysis of orthologous segments from four cereal genomes. Genetics 162, 13891400.
Ramakrishna, W., Emberton, J., SanMiguel, P., Ogden, M., Llaca, V., Messing, J., and Bennetzen, J.L. (2002b). Comparative sequence analysis of the sorghum Rph region and the maize Rp1 resistance gene complex. Plant Physiol. 130, 17281738. Sainz, M.B., Grotewold, E., and Chandler, V.L. (1997). Evidence for direct activation of an anthocyanin promoter by the maize C1 protein and comparison of DNA binding by related Myb domain proteins. Plant Cell 9, 611625.[Abstract]
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, A., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000). PipMaker: A web server for aligning two genomic DNA sequences. Genome Res. 10, 577586. Shimamoto, K., and Kyozuka, J. (2002). Rice as a model for comparative genomics of plants. Annu. Rev. Plant Biol. 53, 399419.[CrossRef][Medline] Shure, M., Wessler, S., and Fedoroff, N. (1983). Molecular identification and isolation of the waxy locus in maize. Cell 35, 225235.[CrossRef][Web of Science][Medline]
Song, R., Llaca, V., and Messing, J. (2002). Mosaic organization of orthologous sequences in grass genomes. Genome Res. 12, 15491555. Tagle, D.A., Koop, B.F., Goodman, M., Slightom, J.L., Hess, D.L., and Jones, R.T. (1988). Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439455.[CrossRef][Web of Science][Medline] Tamura, K., and Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512526.[Abstract]
Tarchini, R., Biddle, P., Wineland, R., Tingey, S., and Rafalski, A. (2000). The complete sequence of the 340 kb of DNA around the maize Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell 12, 381391. Tatusova, T.A., and Madden, T.L. (1999). BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247250.[CrossRef][Web of Science][Medline]
Thacker, C., Marra, M.A., Jones, A., Baillie, D.L., and Rose, A.M. (1999). Functional genomics in Caenorhabditis elegans: An approach involving comparisons of sequences from related nematodes. Genome Res. 9, 348359.
Tikhonov, A.P., SanMiguel, P.J., Nakajima, Y., Gorenstein, N.M., Bennetzen, J.L., and Avramova, Z. (1999). Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc. Natl. Acad. Sci. USA 96, 74097414.
Vicente-Carbajosa, J., Moose, S.P., Parson, R.L., and Schmidt, R.J. (1997). A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proc. Natl. Acad. Sci. USA 94, 76857690.
Walker, J.C., Howard, E.A., Dennis, E.S., and Peacock, W.J. (1987). DNA sequences required for anaerobic expression of the maize alcohol dehydrogenase1 gene. Proc. Natl. Acad. Sci. USA 84, 66246628.
Wolfe, K.H., Gouy, M., Yang, Y.-W., Sharp, P.M., and Li, W.-H. (1989). Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86, 62016205. Yang, Z. (1997). PAML: A program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555556.
Yu, J., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 7992.
Zhu, J., Liu, J.S., and Lawrence, C.E. (1998). Bayesian adaptive sequence alignment algorithms. Bioinformatics 14, 2539. This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | THE PLANT CELL | PLANT PHYSIOLOGY | |
|---|---|---|---|