|
|
||||||||
|
First published online April 29, 2005; 10.1105/tpc.105.031575 © 2005 American Society of Plant Biologists
A Tiling Microarray Expression Analysis of Rice Chromosome 4 Suggests a Chromosome-Level Regulation of Transcription
|
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
As a first attempt to decipher the rice genome, computational annotation has been successful, although improvements are needed (Yuan et al., 2003
). Recent efforts to verify experimentally the gene model structure by sequencing cDNAs and ESTs have provided valuable information toward our understanding of gene structure and genome-coding capacity (Wu et al., 2002
; Rice Full-Length cDNA Consortium, 2003
; Rensink and Buell, 2004
). However, to date, only approximately half of the predicted genome-coding capacity has had any cDNA or EST expression support. Massively parallel signature sequencing provides another tool to analyze the transcriptional activity of complex genomes (Meyers et al., 2004
) and is just being applied to rice. Recently, tiling path DNA microarrays have made it possible to detect expression within almost any desired portion of the genome in an unbiased and high-throughput manner. Studies in several model genomes using tiling path microarray analysis revealed an abundance of previously unpredicted transcribed regions in each of the chromosomes or genomes investigated (Shoemaker et al., 2001
; Kapranov et al., 2002
; Rinn et al., 2003
; Yamada et al., 2003
). Clearly, experimental approaches complementary to computation-based genome annotation are essential for an understanding of genome structures. Because of the presence of large amounts of unfinished sequence data, unusual compositional gradients in genes, and the large size of the rice genome (Wong et al., 2002
; Rensink and Buell, 2004
), there is even greater need for experimental approaches in rice genome annotation.
A chromosome-scale transcriptional analysis will also expand our knowledge of possible chromosome-level transcriptional regulation. One prominent feature of eukaryotic chromosomes is their organization into heterochromatic and euchromatic regions. Heterochromatin was first distinguished from euchromatin cytologically as more intensely staining nuclear material throughout the cell cycle in Bryophyta (Heitz, 1928
). For a long time, heterochromatin was considered a junkyard composed of only noncoding DNA and silent transposons. The cytological appearance of heterochromatin actually reflects a specific chromatin-packaging condition frequently associated with transcriptional dormancy (Hennig, 1999
). In most well-studied eukaryotes, heterochromatin is found near centromeres and telomeres. Sequencing of heterochromatic regions revealed the existence of tandem long repeats and transposons (Hennig, 1999
). On the other hand, the density of nonredundant protein-coding gene models and the recombination rate are low in heterochromatic regions (CSHL/WUGSC/PEB Arabidopsis Sequencing Consortium, 2000
).
The idea that heterochromatin influences the regulation of nearby genes began with the early observation of position effect variegation in Drosophila (Müller 1930
). It was further noted that a euchromatic site could become a heterochromatic site in nature under certain conditions (Henikoff and Comai, 1998
). Recent genetic studies in plants provide supporting evidence for the chromatin-level regulation of gene expression (Hoekenga et al., 2000
; Stam et al., 2002
; Scheid et al., 2003
). RNA interference has been suggested to be an underlying mechanism that acts by directing DNA and histone modifications to control gene expression (Bender, 2004
; Lippman and Martienssen, 2004
; Lippman et al., 2004
; Matzke and Birchler, 2005
). In fact, heterochromatin has emerged as a key regulator in the epigenetic control of gene expression, chromosome behavior, and evolution.
Heterochromatin was systematically investigated in maize (Zea mays) by chromosome staining (McClintock, 1929
). Recent cytological studies have revealed dynamic cytological characteristics of plant chromosomes, especially in heterochromatic regions in Arabidopsis thaliana and rice (Fransz et al., 1998
; Cheng et al., 2001
). However, the functions of these patterns are not fully understood (Avramova, 2002
). Different from Arabidopsis, cytologically defined heterochromatic regions of rice cover a significantly larger portion of the pericentric region in the majority of rice chromosomes. For example, approximately half of rice chromosome 4 is characterized as heterochromatic based on its dense staining pattern, including the entire short arm and another
9-Mb extension beyond the centromere into the long arm (Cheng et al., 2001
). Little if any information is available regarding chromosome-level transcriptional activity regulation in distinct chromatic regions or the functional significance of rice heterochromatin (Cheng et al., 2001
). Using tiling path microarray analysis as a tool, it is now possible to perform high-throughput profiling of the transcriptional activities along an entire sequenced chromosome to examine potential connections between transcription and cytologically defined chromatin organization.
In this study, we developed a tiling path DNA microarray consisting of overlapping PCR-amplified genomic fragments covering >33 Mb (95.5%) of japonica rice chromosome 4. Using this array, we analyzed the transcriptional activity of chromosome 4 in six representative organs or tissues. Chromosome-scale transcription patterns were analyzed and compared with cytologically observed chromatin organization and the distribution of transposon-related and various other gene model groups.
| RESULTS |
|---|
|
|
|---|
|
|
|
We mapped all currently annotated gene models along chromosome 4 to genomic DNA fragments in the minimal tiling path collection. In total, our tiling path array represents 5464 (96.2%) of the 5682 gene models (including transposon-related models). Expression for 82% (3296) of all 4025 nonredundant protein-coding gene models (excluding transposon-related models) was observed in at least one of the six organs or tissues (Figure 3B; see Supplemental Table 1 online). The expression detection rate in each individual organ or tissue ranged from 56% (panicle) to 73% (seedling root). Expression of
50% (2013) of nonredundant protein-coding gene models was detected in all six organs or tissues.
For the 1640 gene models matching available full-length cDNAs (Rice Full-Length cDNA Consortium, 2003
), we detected expression of 1383 (86%) gene models in at least one organ or tissue sample (Figure 3B). The panicle sample had the lowest detection rate (57%) of cDNA-supported gene models, whereas seedling shoot and root had the highest rates (80 and 79%, respectively). Again,
54% of the cDNA-supported gene models were commonly expressed in all organs or tissues.
The rice genome has a long history of duplication (Blanc and Wolfe, 2004
; Paterson et al., 2004
; Yu et al., 2005
). We used our expression data to estimate and compare the expression rates of duplicated genes and unique genes on chromosome 4. Approximately 19% of all gene models in this chromosome have >80% of their full length similar to another gene and were considered potential duplicated genes. The expression of 76% of those potential duplicated genes was detected in at least one sample. One the other hand, the expression detection rate for genes without any evidence of duplication, which account for 68% of all gene models, was 80%. In agreement with a recent classification of duplicated full-length cDNAs (Yu et al., 2005
), we found that 91% of tandem duplicated, 87% of segmental duplicated, and 75% of background duplicated full-length cDNAs had expression detected.
Detection of Transcriptional Activity in the Nonannotated Regions of Rice Chromosome 4
Differing from genomic DNA amplicon microarrays that only represent annotated genes (Jiao et al., 2003
; Kim et al., 2003
), tiling path microarrays are also useful for detecting novel transcription in nonannotated regions. To this end, we applied the same expression threshold to examine fragments within nonannotated regions in each sample set. These surveys resulted in the detection of transcriptional activity in 1643 nonannotated regions (see Supplemental Table 2 online). Among these 1643 nonannotated regions, 1076 (65%) were commonly detected in all organs or cultured cells. This is similar to the 61% (2013) of the 3296 expressed gene models that were commonly expressed in all six organs and cultured cells.
To provide independent experimental support for the transcriptional activity detected in these nonannotated regions, we randomly selected 21 such expressed nonannotated regions for RNA gel blot analysis using the same RNA samples used for microarray analysis. Specific RNA species in 16 of these 21 nonannotated regions were clearly detected in this RNA gel blot analysis (Figure 4). The RNA gel blot signal strengths for most bands were usually consistent with microarray hybridization intensities (data not shown).
|
|
|
0.5 to 2 Mb away from the bordering region of the heterochromatin and euchromatin of chromosome 4.
Interestingly, this defined separation of heterochromatin and euchromatin correlates with a general stronger transcriptional activity of juvenile-stage rice tissues in the euchromatin half (Figure 6A). Although the transcriptional profiles for root were quite distinct from those for shoot (Figure 5), they both had strong transcription in euchromatin. On the other hand, samples from reproductive-stage flag leaf and panicle had relatively weaker transcription in euchromatin but greater transcription in parts of the heterochromatic regions. These two reproductive-stage organs had strong transcription near the centromere region, which is located at
9.6 Mb (Zhang et al., 2004
) (Figure 6D). Compared with the other samples, cultured cells had more uniform transcriptional activities along the chromosome.
We also examined the percentage of transcribed fragments among fragments with annotated gene models in 100-kb windows along the chromosome (Figure 6B). We noted that reproductive-stage panicle and flag leaf had fewer gene models transcribed in euchromatic regions from 18 Mb to the distal end (Figure 6D). In heterochromatic regions, there was generally no obvious difference in the percentage of transcribed fragments among different organs or cultured cells. One distinct exception was the first 2-Mb region at the distal end of the short arm, where juvenile samples, together with cultured cells, had stronger transcription than reproductive samples. In fact, the transcription properties of this region were quite similar to those of euchromatic regions in the long arm. Several other domains within heterochromatin show similar transcriptional profiles that are distinct from those of their neighboring regions. These short domains are usually 1 to several Mb in size, as emphasized by the bars in Figure 6A. In the region flanking the centromere, slightly stronger average transcriptional activity was detected in the two reproductive-stage samples, whereas the percentages of transcribed gene models were close among all samples. These patterns suggest slightly higher transcriptional activity of gene models in the 1-Mb region toward the centromere in reproductive flag leaf and panicle. Chromosomal domains with similar transcriptional profiles between different organs and cultured cells are highlighted by bars in Figure 6A. It is interesting that this discontinuous pattern of transcriptional activity in the heterochromatin half of chromosome 4 is somewhat reflected in its uneven cytological staining pattern. Both previous studies (Cheng et al., 2001
) and the data in Figure 6D suggest that the heterochromatin half of chromosome 4 is composed of eight intensely stained knobs (four on each side of the centromere) with relatively weakly staining gaps.
We further examined the annotated gene model density along the chromosome and found that euchromatic regions generally have more nonredundant protein-coding gene models (excluding transposon-related models) than heterochromatic regions (Figure 6C). However, no general correlation between gene model density and transcription activity seems to exist in all samples examined.
Rice Chromosome 4 Gene Models with Significant Homology with Arabidopsis Genes Are Enriched in Euchromatic Regions
It has been reported that a large fraction of rice gene models are not significantly homologous at the sequence level with Arabidopsis genes (Rice Chromosome 10 Sequencing Consortium, 2003
; Rice Full-Length cDNA Consortium, 2003
). In principle, those rice gene models lacking significant homology with Arabidopsis genes may represent fast-evolving genes. Alternatively, the majority of this group of less conserved gene models may represent highly diverged transposon-related sequences and may not be real genes (Bennetzen et al., 2004
). Therefore, grouping rice gene models based on their homology with the Arabidopsis genome may also separate gene models with high confidence from potentially misannotated models.
To this end, we compared all gene models of rice chromosome 4 against the Arabidopsis genome. An expectation value cutoff of 107 was used for the homology search (see Methods for details). Based on these criteria, 3166 rice chromosome 4 gene models exhibited significant sequence homology with Arabidopsis genes and were named high-homology (HH) gene models. The rest of the 2516 gene models lacked significant homologous counterparts in the Arabidopsis genome and were defined as low- (or no-) homology (LH) gene models.
Figure 7 illustrates the chromosomal distribution of nonredundant HH and LH gene models along the chromosome by the density of each group. In the first half of the chromosome, which is mostly cytologically defined as heterochromatin, the densities of HH and LH gene models are quite similar. This part includes the entire short arm and the proximate portion of the long arm, for a total length of
19 Mb. In the rest of this chromosome (from 19 Mb toward the distal end of the chromosome), we found a conspicuous enrichment of HH gene models, with twice as many HH gene models present. The HH gene model density shows a dramatic increase from
80 gene models/Mb to
110 gene models/Mb, whereas the density for LH gene models exhibits a clear reduction of
25 gene models/Mb in euchromatic regions compared with heterochromatic regions.
|
|
|
To further dissect transposon-related gene models on chromosome 4, we separated out retrotransposon gene models and DNA transposon gene models (Mao et al., 2000
; Turcotte et al., 2001
). A total of 1147 transposon-related gene models were classified as retrotransposons, and 344 were classified as DNA transposons. Among the retrotransposons, the gypsy-like subclass was the dominant group, with 505 members. Another 124 retrotransposon-related models matched the copia-like subclass. A total of 285 DNA transposon-related models matched the En/Spm superfamily.
Although there are clearly more retrotransposon-related gene models in heterochromatin than in euchromatin, the DNA transposon gene models show less difference in distribution along the chromosome (Figure 10A). In euchromatic regions, retrotransposon gene models have a density of
15 gene models/Mb. The density of retrotransposon models has a threefold increase to
50 gene models/Mb in heterochromatin. The average density of DNA transposons increases from <10 gene models/Mb in euchromatin to
15 gene models/Mb in heterochromatin. Our result is consistent with the previously reported active transcription of retrotransposons in grass genomes by EST analysis (Vicient et al., 2001
).
|
| DISCUSSION |
|---|
|
|
|---|
A Minimal Tiling Path Microarray to Represent the Entire Chromosome
As an extension of such PCR-based genomic DNA amplicon microarrays (Kim et al., 2003
), which cover only annotated coding regions, a minimal tiling path microarray was used to survey global transcriptional activity. A minimal tiling path microarray has several advantages. First, its essentially full-chromosome coverage has the potential to detect novel sites of transcription, irrespective of available annotation information. Second, by using the same DNA fragments used for sequencing, it is possible to amplify the entire chromosome in an efficient and economical way using only universal primer pairs with high throughput and reproducibility. This tiling path microarray construction strategy was recently implemented to study transcription of the human genome at a small scale in a 1.7-Mb region (Li et al., 2004
). Third, tiling path microarray analysis also has the potential to be used in other applications on a genome scale. For example, global proteinDNA binding and DNA modification sites can be mapped by a tiling path microarray (Sun et al., 2003
).
However, there are also limitations to the minimal tiling path microarray. The resolution of this array is not adequate to examine the structure of a single gene model. Indeed, 19% of the gene models studied were represented by subclones that covered more than one gene model, although many of these were within clusters of short gene models. Cross-hybridization is a common and inherent problem of microarray analysis and for other DNA hybridization-based methods (Held et al., 2003
). Both of these situations clearly affected the accuracy of our assessment to some extent. However, comparisons of average transcriptional activity for each 100-kb window overcame this weakness to some extent (Figure 6). To test the potential cross-hybridization caused by such short repetitive sequences with a high copy number in the genome as miniature inverted repeat transposon-related elements (MITEs) (Feng et al., 2002
), we examined all nonannotated fragments used for RNA gel blot analysis and found nine of those fragments with MITEs. None of them showed a smear on the RNA gel blot, regardless of whether the main bands were detected (Figure 4). Among our 21 randomly selected nonannotated fragments, 7 of them also included simple repeats, such as (CAG)n. Some of these simple repeats have hundreds of copies in the genome. Again, RNA gel blot results did not suggest a cross-hybridization problem, because no smear was detected on RNA gel blots. Therefore, cross-hybridization may not be a major problem. However, considering the technical difference between RNA gel blot and microarray analyses, it is not feasible to estimate the exact extent of cross-hybridization in our microarray analysis.
Repetitive sequences without coding capacity did not significantly affect the hybridization of reverse-transcribed probes to the microarray or even to RNA directly (Figure 4). Although many transposons exist in copy numbers of only a few per genome and exhibit homology only to a low degree (Bennetzen et al., 2004
), we cannot rule out the potential cross-hybridization among certain transposon-related families, specifically transcribed retrotransposons, and DNA transposons. The separation of transposon-related gene modelcontaining fragments from all other fragments largely isolated this problem to only transposon-related gene models. A sequence similarity search suggests that
60% of the transposon-related gene models have at least one other such gene model with high similarity (>80% at the nucleotide level) in the entire rice genome. A similar problem could also happen to duplicated genes and to nonannotated regions. Close to 19% of all gene models are considered potential duplicated genes. Interestingly, these potential duplicated genes do not exhibit stronger transcription than unique genes. We believe that portions of these duplicates, especially background duplicates, are actually pseudogenes. It has been shown that DNA sequences with identity above 80% exhibit detectable cross-hybridization (Hughes et al., 2001
); therefore, when interpreting the hybridization results for transposon-related gene models, we should be cautious about this cross-hybridization effect.
Microarray Analysis Provides an Important Complement to Computational Annotation
We demonstrated that >81% of computationally annotated gene models have transcriptional activity. As a control, our detection rate of full-length cDNA-supported gene models was 86%. Recently, it was suggested that a significant portion of annotated rice-specific gene models might be transposon-related, and some of them may even be expressed (Bennetzen et al., 2004
; Jabbari et al., 2004
; Jiang et al., 2004
). If this is the case, our tiling microarray analysis will be able to detect the expression of these transposon-related gene models (Bennetzen et al., 2004
), although detection of expression would not directly imply a functional role of those gene models.
In addition to the annotated regions, we also identified transcriptional activities in 1643 nonannotated regions. Among them, 1076 had transcriptional activities in all samples. Using RNA gel blot analysis, we detected 16 of 21 hybridizing fragments. Those regions not detected by RNA gel blot analysis usually had a low abundance of transcripts based on microarray data. Importantly, in no case did we detect any smear or other minor bands, which would indicate cross-hybridization caused by possible repetitive sequences. However, we may still not be able to rule out cross-hybridization in the detected nonannotated transcription activities.
These novel nonannotated transcription activities may arise from a variety of situations. The first possibility is that some of the transcripts come from missed genes in nonannotated regions. These transcripts can include protein-coding genes and non-protein-coding RNAs with structural, catalytic, or regulatory capacity. These regions may also encode transposons that are transcriptionally active but lack homology with other transposons (Bennetzen et al., 2004
). Alternatively, some of these novel transcripts might be missing parts of a neighboring annotated gene model. It has been reported that although computational annotation algorithms have high reliability for locating a gene model in the chromosome region, they are less reliable in predicting the precise structure of that gene (Zhang, 2002
). For example, an exon can be missed or the 3' or 5' noncoding regions not as accurately identified as the coding regions. The resolution limitation of this microarray, however, prohibited us from further distinguishing these possibilities.
Chromosome-Level Regulation of Transcriptional Activity
The eukaryotic chromosome is organized into two forms, heterochromatin and euchromatin, which are defined by their cytological staining patterns. Recent sequencing of heterochromatin in Arabidopsis and rice successfully integrated cytological features and sequence data (CSHL/WUGSC/PEB Arabidopsis Sequencing Consortium, 2000
; Feng et al., 2002
; Sasaki et al., 2002
; Rice Chromosome 10 Sequencing Consortium, 2003
), providing new insights into this cytological structure. It has been suggested that Arabidopsis heterochromatic regions are determined by transposon-related elements and that their assembly is associated with chromatin remodeling, including DNA methylation, histone methylation, small interfering RNAs, and DNA replication (Soppe et al., 2002
; Lippman et al., 2004
).
Kim et al. (2003)
have observed an association between chromosome organization and expression using microarrays in Arabidopsis chromosome 2. However, limited cytologically defined heterochromatin in the Arabidopsis chromosome makes it difficult to extend the learned relationship between chromatin features and transcription in Arabidopsis to rice, because the latter is dramatically different from Arabidopsis in its chromosome organization.
Using rice chromosome 4 as a case study, we systematically explored the molecular features and transcriptional activities of heterochromatin in rice. Integration of the cytological map with sequence data and the physical map (Chen et al., 2002
; Feng et al., 2002
; Zhao et al., 2002
) suggests that an
17-Mb region starting from the telomere of the short arm is heterochromatic (Figure 6D). We found a high density of retrotransposon gene models in this heterochromatic region (Figure 10A). The density of DNA transposon gene models was also increased slightly in heterochromatin but not in euchromatin.
Profiling of the transcriptional activities of nonredundant protein-coding gene models showed that rice heterochromatin also has many transcriptionally active gene models embedded within it, although at a lower density. In spite of the fact that half of rice chromosome 4 is cytologically defined heterochromatin (Cheng et al., 2001
), it is quite possible that the transcriptionally repressed regions are scattered among transcriptionally active gene models (Lamond and Earnshaw, 1998
). Therefore, it is likely that the microscopically scattered, rather than continuous, heterochromatin regions on the short arm, and also part of the neighboring long arm region, form the cytological heterochromatin. Lippman et al. (2004)
recently showed in Arabidopsis that transcriptionally active regions can indeed lie within small islands of transcriptionally repressed domains in heterochromatin. Furthermore, the heterochromatic region has a similar density of HH and LH gene models, whereas euchromatin has a much higher density of HH gene models than LH gene models (Figure 7). However, the causal relationship between HH/LH gene model distribution and heterochromatin and euchromatin division is not clear. We speculate that chromatin structure may play a global regulatory role in the transcription of nonredundant protein-coding gene models during development (Figures 6 and 8). Euchromatin showed active transcription in all juvenile-stage samples but a generally weaker transcription in all reproductive-stage samples. Heterochromatin showed relatively weak transcription in juvenile samples, but the transcription of a domain in heterochromatin is noticeably stronger at the reproductive stage. In reproductive-stage samples, the transcriptional activity in most of the heterochromatin is similar to that in euchromatin. Cultured cells as undifferentiated cell types show relatively more uniform transcriptional activity along the entire chromosome (Figure 6). Previous studies have shown the effect of chromatin structures on gene transcription in a wide range of plants, including monocots and dicots (Mlynárová et al., 1994
; Tikhonov et al., 2000
; Rudd et al., 2004
), but our data extended this possible regulation to the chromosomal scale and tentatively associated this regulation with the heterochromatin and euchromatin structures. The underlying mechanisms of those regulations are not known, but modification at the DNA or histone level or RNA interference are possible routes (Henikoff and Comai, 1998
; Pandey et al., 2002
; Reyes et al., 2002
; Lippman and Martienssen, 2004
). Contrary to the nontransposon gene models, an increase in transcriptional activities of transposon-related elements in reproductive-stage organs, especially those located in heterochromatic regions, was observed (Figure 9). Cultured cells showed medium transcriptional activity between reproductive-stage and juvenile-stage organs. A very recent study has suggested that certain retrotransposon RNA may interact with chromatin structure to form complexes with the potential to maintain the chromatin functions in maize (Topp et al., 2004
). In Arabidopsis, Lippman and colleagues (2004)
demonstrated that DNA methylation and histone modification correlate with the activation of transposon-related gene models using a tiling path microarray for a heterochromatic region. It is quite possible that certain rice retrotransposon RNA species may also perform some functions to maintain the structure of chromatin, and rice also uses DNA methylation and histone modification to control the activities of transposon-related elements.
Thus, it is reasonable to speculate that the organization of the chromosome into heterochromatic and euchromatic regions may enable rice to have another level of regulation at the chromosome level. This regulation could be applied to both nonredundant protein-coding gene models and to transposon-related gene models. For both groups of gene models, it seems that the developmental phase, rather than the organ identity, may be the signal for the chromosome-level regulation of transcription. This type of developmental regulation at the chromosome level would be consistent with the reported developmental regulation of heterochromatin assembly and activity (Preuss, 1999
; Meyer, 2000
; Ahmad and Henikoff, 2001
).
An increasing body of evidence has shown that transposon-related elements contribute to the evolution of a genome (Feschotte et al., 2002
). Studies of the current genome sequence structure in maize have suggested that transposon-related elements could rapidly restructure a genome (SanMiguel et al., 1998
). Our observation of the active expression of transposon-related gene models in reproductive stages suggests that this process may occur more frequently at the mature stage, or the end of the life cycle, in a rice plant. Although the movement of transposon-related elements can provide resources for selection during evolution, they are more likely to cause malfunction of useful genes. Activation of transposon-related elements, especially retrotransposons, only at the mature stage seems to be a solution for balancing both survival and the need for new gene creation.
| METHODS |
|---|
|
|
|---|
A chromosome 4 pseudomolecule, computational gene models, and their predicted transcripts (version 2.0, April, 2004) were downloaded from the TIGR rice genome database (Yuan et al., 2003
; http://www.tigr.org/tdb/e2k1/osa1/). Full-length cDNA sequences were downloaded from the Knowledge-Based Oryza Molecular Biology Encyclopedia database (Rice Full-Length cDNA Consortium, 2003
; http://cdna01.dna.affrc.go.jp/cDNA/). Subclone fragments and gene model transcripts were remapped to the pseudomolecule using BLAT (Kent, 2002
). Fragments longer than 5 kb or containing sequence gaps were removed. In the final count, 14,742 fragments were successfully amplified and used in the tiling path microarray. The expression of each gene model was represented by the expression of the fragment with the longest overlap. Overlap calculation was based on BLAT results of gene models and fragment locations on the chromosome pseudomolecule. In general, fragments covering more than one gene model were not used unless they occurred in the following situations: (1) one gene model occupied the predominant portion of the sequence and it did not have any unique genomic fragment by itself; and (2) the other gene model took up only a small region of the fragment and its expression, as judged from the other overlapping fragment, was at a much lower level. In total, 845 fragments fit these criteria and were used for the calculation of expression.
Construction of the Tiling Path Microarray
Selected subclone fragments were amplified by PCR using subclone plasmid DNA as a template. We performed PCR using TaKaRa LA Taq and TaKaRa Ex Taq kits (TaKaRa, Dalian, China) with 0.02 nM of one of the following common primer pairs (LAS2, 5'-CCCAGTCACGACGTTGTAAAACGACGGCCAGTGCC-3', and LAS4, 5'-GAATTGTGAGCGGATAACAATTTCACACAGGAAAC-3'; LAF, 5'-CCCTCGAGGTCGACGGTATCGATAAGCTTGATATC-3', and LARb, 5'-GTAATACGACTCACTATAGGGCGAATTGGAGCTCC-3'; S2, 5'-CGTTGTAAAACGACGGCCAG-3', and S4, 5'-CGGATAACAATTTCACACAG-3'; and F, 5'-CCTCGAGGTCGACGGTATCG-3', and Rb, 5'-AATACGACTCACTATAGGGC-3') and 5 ng of plasmid DNA. PCR amplicons were purified by ethanol precipitation. We resuspended purified PCR products in water and ran an equal amount of sample from each fragment on an agarose gel for quality control. Fragments that migrated as a single band of the predicted size and also had a DNA concentration >100 ng/µL were used for microarray printing.
Arabidopsis Functional Genomics Consortium microarray control sets were used as negative controls. These 18 controls were selected as showing no cross-hybridization with Arabidopsis thaliana RNA (http://www.arabidopsis.org/links/microarrays.jsp). After sequence comparison with rice and pilot experiments, we removed two additional controls with potential cross-hybridization. Each of the remaining 16 distinct controls was repeated 16 times for array printing.
For microarray printing, we combined the resuspended PCR fragments with DMSO (1:1) and transferred 8 µL of each sample to 384-well printing source plates (Whatman, Clifton, NJ). All 256 negative control DNA fragments and 16 rice genomic DNA samples were also resuspended in printing solution and transferred onto printing plates. All DNA fragments were arrayed onto Corning (Corning, NY) GAPS slides using a VersArray ChipWriter Pro system (Bio-Rad, Hercules, CA). Printed slides were allowed to dry at room temperature and cross-linked at 150 mJ in a Stratalinker (Stratagene). Print quality was confirmed by staining for total DNA with POPO-3 (Molecular Probes, Eugene, OR).
Plant Materials
The rice strain used for all experiments was japonica cv Nipponbare. Seeds were baked at 42°C for 3 d to break seed dormancy and then spread in water at 26 to 28°C. We collected 7-d-old seedling shoots and roots (Figure 5A) from normal light-grown plants. Two-week-old seedlings were transferred into soil in controlled-environment chambers (26 to 28°C, 13-h-light/11-h-dark light cycle). Tillering-stage shoots (with four tillers), heading-stage flag leaves, and heading-stage panicles were collected from those plants with normal growth. Suspension cell cultures were also derived from the same rice strain using previously described protocols (Yu et al., 1991
).
Preparation of Labeled cDNA Probes and Microarray Hybridization Conditions
Plant materials were frozen in liquid nitrogen and ground to powder using a chilled mortar and pestle. Total RNA was isolated using TRIzol (Invitrogen, Carlsbad, CA) and purified using the RNeasy kit (Qiagen, Valencia, CA). Oligo(dT) was used to selectively synthesize and label cDNA from poly(A)+ mRNA. The probe-labeling protocols used for this study were modified from those used for EST microarrays (Ma et al., 2001
). Total RNA (100 µg) was labeled by direct incorporation of amino-allylmodified dUTP (Sigma-Aldrich, St. Louis, MO) during reverse transcription. After reverse transcription, template RNA was degraded. The amino-allylmodified dUTPlabeled cDNAs were purified using a Microcon YM-30 filter (Millipore, Billerica, MA) and resuspended in 0.1 M NaHCO3. The cDNA probe was further fluorescently labeled by conjugating the monofunctional Cy3 or Cy5 dye (Amersham, Piscataway, NJ) to the amino-allyl functional groups. After coupling at room temperature for 45 to 60 min, the labeling reaction was stopped by ethanolamine. The fluorescent dyelabeled probe was separated from unincorporated monofunctional dye and concentrated to a final volume of 7 µL for hybridization using a Microcon YM-30 filter. Microarray hybridization, microarray slide washing, and array scanning were performed as described previously (Ma et al., 2001
).
Microarray Experimental Design and Dye-Effect Assessment
We followed the reference design for microarray experiments (Clarke and Kempson, 1997
; Wu et al., 2003
). A suspension-cultured cell RNA sample was used as the reference sample, and all comparisons were made between an organ sample and the reference sample with the same direction of dye labeling. Each organ sample was collected from multiple plants, and three independent biological replicates were collected. Amplified cDNA was prepared twice from each RNA sample. Thus, six quality data sets from three independent RNA samples were obtained for each organ type. Pooled suspension-cultured cell RNA was used as a reference sample in all hybridizations. We selected suspension-cultured cells as the reference because it is an undifferentiated cell type and expresses a relatively high percentage of its genome.
We started with a set of pilot experiments to check the dye effect on our microarray data quality using seedling shoot and cultured cell RNA samples on this tiling array. Each RNA sample was labeled with Cy3 or Cy5. Seedling shoot RNA and cultured cell RNA samples labeled with different dyes were paired together and hybridized to two identical slides to obtain two repeats, with the only difference being the labeling direction. This step was repeated three times (with the three independent RNA samples) to provide three repeats labeled in one direction and three in the reverse direction. We calculated the correlation coefficient of logarithm-transformed raw intensities between each pair of intersample repeats before any correction or normalization. As shown in Supplemental Figure 1 online, we found that the correlations between repeats labeled with the same dye did not exhibit significant differences in data reproducibility compared with repeats labeled with different dyes. This relatively low dye effect is likely attributable to the fact that we used the amino-allyl dye-coupling method for labeling to reduce the incorporation bias of the two dyes (Lee et al., 2004
). Based on this initial study, we chose to use the Cy5 dye for all organ sample labeling, whereas the reference sample was labeled with Cy3 dye.
Microarray Data Collection and Initial Normalization
Hybridized microarray slides were scanned with a GenePix 4000B scanner (Axon, Union City, CA), and TIFF images were initially processed using GenePix Pro 3.0 software. Spots with unusual morphology or high background were manually removed from further analysis. All spots with foreground intensity less than two times the background SD in both channels were also removed. Median foreground minus median background intensities were used for subsequent steps.
To identify and remove systematic sources of variation, including dye effects and spatial effects, we used the lowess normalization method for each print-tip group on each slide with the MAANOVA package for R (Yang et al., 2002
; Wu et al., 2003
). Normalized log2-transformed ratios of signal intensities had a median of zero.
Differential Expression Analysis
To identify differentially expressed spots among five organs, we fitted normalized replicate intensities of all organs together with cultured cell controls into an analysis of variance model. For each data set, the spot intensities of both organ and reference cultured cells (Cy3 and Cy5 channels) were used. The model is given by yijkl = µ + Ai + Dj + ADij + Sl + VSkl + DSjl + ASil +
ijkl, where yijkl denotes the logarithm-transformed signal for spot l on slide i labeled with dye j of sample k. The overall mean effect was represented by µ; A, D, and S represent main effects from array, dye, and spot, respectively. The interaction terms AD, VS, DS, and AS represent array by dye, sample (variation) by spot, dye by spot, and array by spot, respectively. The random error is denoted by
ijkl. We were interested in the term VS. The analysis of variance described above was performed on each spot using MAANOVA for R with F statistics computed on the James-Stein shrinkage estimates of the error variance (Kerr et al., 2000
; Wu et al., 2003
). The false-discovery rate controlling method (Benjamini and Hochberg, 1995
) was used to control for multiple testing errors through a function included in MAANOVA. We selected spots with P < 0.001 in the F test after false-discovery rate adjustment as differentially expressed. To further examine these selected differentially expressed spots, the log2-transformed expression ratios between each organ sample and cultured cells were clustered using the CLUSTER program with an unsupervised centroid linkage hierarchical clustering algorithm (Eisen et al., 1998
).
Determination of the Expression Threshold
The threshold for the detection of expression was determined for each hybridization repeat by constructing a distribution of normalized intensities obtained by considering the negative control spots. We found an intensity cutoff with a false-positive rate of 1% using Student's t test on log10-transferred intensities. All positive control spots of genomic DNA on each slide had expression above this cutoff. We also selected 515 fragments containing cDNA to further estimate type II errors. We found
80% of them had intensities above this cutoff in most samples (Figure 2B). Recent studies in Arabidopsis suggest that the detection rate of cDNA in an organ sample ranges from 77 to 84% (Yamada et al., 2003
; Redman et al., 2004
). Thus, such a 1% false-positive rate will also give a reasonable false-rejection (type II error) rate base to the experimental information described above. Spots with an expression signal above the cutoff in four of six data sets for the same organ were considered expressed in that organ type.
Estimation of Average Intensities for Each Fragment (Spot)
For the display of transcriptional activities along the chromosome, intensities of individual replicates for all samples were further normalized via quantile normalization (Bolstad et al., 2003
). After this normalization, intensities of all replicates representing different organ or cultured cell samples reached a common median. All normalized intensities for each expressed spot were then averaged among all replicates of the same sample to obtain a single statistic, which was used to perform subsequent analyses of expression patterns along the chromosome.
Gene Model Classification
All annotated rice genes were divided into HH and LH genes based on their homology with Arabidopsis genes. Homology was based on a BLAST search of both annotated rice and Arabidopsis gene model sequences by means of tBLASTN (Altschul et al., 1997
). We used the TIGR version 4.0 Arabidopsis gene model annotation downloaded from The Arabidopsis Information Resource (ftp://ftp.arabidopsis.org). Specifically, all rice gene models were compared against each Arabidopsis gene model. For genes encoding proteins with 200 residues or more, an expectation cutoff value of 107 for sequences not <100 residues was used. For genes encoding proteins with fewer than 200 residues, the expectation cutoff value of 107 for >50% of the full length was used. A total of 3166 rice genes were identified with significant homologs in the Arabidopsis genome (HH gene models). The rest of the 2516 rice genes were classified as LH gene models.
Transposon-related gene identification followed the TIGR version 2.0 annotation. In total, 1501 such gene models on chromosome 4 were covered by this tiling path microarray. Further classification of retrotransposons and DNA transposons also followed the annotation from TIGR. Retrotransposons included 124 from the Ty1/copis subclass, 505 from the Ty3/gypsy subclass, 10 from the LINE subclass, 8 gag proteins, and 500 unclassified proteins. DNA transposons included 285 from the CACTA En/Spm subclass, 47 from the mariner subclass, 7 from the ping/pong/SNOOPY subclass, 2 from the Ac/Ds subclass, and 3 unclassified transposons. MITEs and simple repeats were identified directly from the chromosome 4 pseudomolecule using RepeatMasker (http://www.repeatmasker.org).
RNA Gel Blot Analysis
Total RNA from the same organ type was used for RNA gel blot analysis. Subcloned DNA fragments were labeled with [
-32P]dCTP using a random primers DNA labeling system (Invitrogen). RNA