- American Society of Plant Biologists
The advent of large-scale transcriptional profiling techniques signalled a new age in biology. Instead of understanding the expression and action of single genes, the field of transcriptomics allows for the examination of whole transcriptome changes across a variety of biological conditions. These techniques have resulted in a massive accumulation of gene expression data and the need for refinement in the formulation of biological questions when using such data. Instead of collecting data about changes in expression profile at the level of the organism or of particular organs, we can now focus on both the default and responsive transcriptional states of tissues and individual cells and generate novel hypotheses as to how these states collectively form a functioning organism. Transcriptional profiling has indeed become a well-used component of a biologist's toolbox. This review will describe some of the unique biological insights into plant functions that have been revealed from this type of analysis. For the readers' convenience, references mentioned throughout this review have been summarized in Table 1 .
Selected Plant Expression Profiling Studies
TECHNIQUES USED IN TRANSCRIPTIONAL PROFILING
The more commonly used large-scale techniques employ one of two strategies. In the first strategy, sequence tags from a given RNA sample are generated. In the second, mRNA populations of interest are hybridized with a large number of probes immobilized on a suitable substrate (e.g., various types of microarrays and BeadArrays). These strategies are complementary to each other.
Generation of tags is independent of knowledge of gene annotation but does require extensive sequencing and a reference genome to determine gene identity. Low-abundance transcripts tend to be underrepresented. Tags used can include ESTs of ∼200 to 900 nucleotides, the 15-nucleotide tags used in serial analysis of gene expression (SAGE), and the 17- to 20-nucleotide tags used in massively parallel signature sequencing (MPSS). ESTs are created by sequencing the 5′ or 3′ ends of randomly isolated transcript cDNA. These tags are longer than those of SAGE and MPSS techniques but nonetheless represent only partial gene sequences. In SAGE, a restriction enzyme that is a frequent cutter (usually NlaIII) is used to digest and release cDNA molecules attached to oligo(dT)-coated beads. A linker is then attached to the cDNA fragment that contains a site for a Type IIs restriction enzyme (usually BsmF1), and this enzyme is then used to release a short 15-bp tag. These tags are then ligated together and the concatemer clones are sequenced. The abundance of sequence signatures reflects gene expression in the tissue being sampled (Velculescu et al., 2000). The generation of MPSS tags differs only slightly from the generation of SAGE tags. Instead of cutting with NlaIII, the cDNA molecules attached to beads are cut with DpnII, and each transcript receives a tag that contains a MmeI site, thus generating a 20-bp tag. A major difference between the two methods is that MPSS uses a nonconventional sequencing method that uses beads and allows sequencing of at least 5 × 105 tags at one time (www.lynxgen.com). The SAGE and MPSS techniques give quantitative measures of transcript abundance. These techniques have been used to generate expression data across tissue types or developmental stages in higher plants (Ewing et al., 1999; Ogihara et al., 2003; Fizames et al., 2004).
Alternative methods of gene expression profiling include microarrays (Schena et al., 1995; Stears et al., 2003) and the more recently developed BeadArrays (Kuhn et al., 2004). Solid substrates can include rectangular slides (chips) in the case of microarrays or beads arrayed in fiber optic bundles in the case of BeadArrays. Variability in chip arrays, both inter- and intra-array, is often high due to a number of factors, including probe (cDNA or oligomer) spotting and hybridization parameters. However, chip arrays allow for the screening of thousands of probes, often representing the whole genome in parallel. Conversely, BeadArray technology has low variability. Each bead has several hundred thousand copies of an oligomer, and individual bead types representing different oligomers are quantitatively pooled and then densely packed into etched fiber optic bundles that are arranged in a 96-array matrix (Kuhn et al., 2004). Each array is therefore unique because arrays generally have greater than or equal to five beads of each type such that all sequences are represented multiple times. The randomness of the array minimizes the effect of spatially localized artifacts, and redundancy increases measurement precision and robustness. As of yet, this technique has only been applied to a study involving 587 human genes (Kuhn et al., 2004). Nevertheless, the potential to develop this system in the analysis of whole genomes is quite promising.
One limitation of these techniques is that the genes assayed are dependent on which probes are immobilized to the substrate. However, if probes corresponding to low abundance transcripts are present, this can circumvent one of the main issues with microarrays. A primary caveat of the array methods is that the measurement of gene expression profiles is based on a hybridization ratio and therefore determines relative transcript abundance. RNA gel blot analysis (Van Zhong and Burns, 2003; Parisi et al., 2004), quantitative RT-PCR (Leonhardt et al., 2004; Jiao et al., 2005), and even promoter–reporter experiments generally appear to confirm differential gene expression indicated by microarray analysis. In addition, ribonuclease protection assays and in situ hybridization can also validate microarray results (Chuaqui et al., 2002). However, in a detailed study of 48 human genes, a substantial percentage of genes (13 to 16%) showed poor expression correlation between quantitative PCR and normalized microarray data (Dallas et al., 2005). Therefore, caution is required when interpreting microarray results, and data should ideally be verified with multiple methods.
MICROGENOMICS: A NEW ERA IN TRANSCRIPTIONAL PROFILING
Until recently, mRNA populations used in microarray analysis have been derived from whole organs, such as leaves and roots. However, plant organs are extremely heterogeneous, and growth, development, and responses to environmental stimuli and developmental signals differ among cells within an organ (Brandt, 2005). Use of mRNA extracted from bulk materials leads to a loss of detailed information about tissue and cell-specific genomic responses. Therefore, it is becoming more common to extract RNA from specific tissues and cells. Tissue and cell-specific RNA populations have been derived from enriched populations of cells or tissues, such as dissected tissues and organs, in vitro tissue cultures, and mutants in which certain cells or tissues are enriched or absent (Demura et al., 2002; Brandt, 2005; Grimanelli et al., 2005). Due to the effects of tissue preparation on gene expression, questionable biological significance of in vitro cultures, and pleiotropic mutational effects, researchers are developing more biologically relevant techniques in which RNA is extracted from very small quantities of living or rapidly extracted tissue (for in depth reviews, see Brandt, 2005; Lee et al., 2005). Laser capture microdissection (LCM), where tissue is fixed and cut out by laser dissection (Nakazono et al., 2003; Woll et al., 2005), protoplasting, and fluorescence-activated cell sorting of green fluorescent protein–marked cells (Birnbaum et al., 2003, 2005) are examples of such techniques. Micropipetting can also be used to extract cell contents from targeted surface cells (Brandt, 2005). If enough RNA is extracted, it can be used directly for microarray analysis. Generally, insufficient quantities of RNA are obtained with these methods. In these cases, cDNA is generated from the RNA and then amplified exponentially with PCR or, to avoid differential amplification, linearly with T7 RNA polymerase. With these rapidly developing techniques, we are beginning to understand the tissue and cell specificity of plant transcriptional regulation, thus fine-tuning our understanding of how plant genome expression changes at the cellular level.
EXPRESSION MAPS AND TRANSCRIPTIONAL SIGNATURES OF DISCRETE ORGANS AND CELL TYPES
Default Expression Maps
Expression maps of organisms using expression profiling are now constructed at an increasingly high resolution as both expression profile platforms and microdissection techniques evolve. Expression maps exist for plant organs such as leaves, roots, flowers, and seeds in many plant species, including Arabidopsis, maize, and soybean (Birnbaum et al., 2003; Honys and Twell, 2003; Nakazono et al., 2003; Vodkin et al., 2004; Grimanelli et al., 2005; Poroyko et al., 2005; Schmid et al., 2005; Tung et al., 2005). The maps detail the default developmental state of these organs. A default developmental state is considered to be growth under optimal (often laboratory) conditions.
Superimposed upon these organ-specific maps are developmental time series that catalogue changes in expression over developmental time in addition to spatial information using tissue or cell-type expression profiles. As an example, the Arabidopsis AtGenExpress data set profiles Arabidopsis development in 79 diverse samples representing different organ types and developmental stages (Schmid et al., 2005). The Arabidopsis root digital in situ describes three developmental stages along the longitudinal axis and five radial cell layers using dissection and fluorescence-activated cell sorting of green fluorescent protein–marked lines (Birnbaum et al., 2003). A comparison of transcripts expressed in the whole root in the AtGenExpress data set to the transcripts present in the root digital in situ revealed that 392 genes were missing in the AtGenExpress data set, thus highlighting the heterogeneity of plant organs and the loss of signal when averaged across an organ.
Organ Transcriptional Signatures
Many organs have distinct transcriptional signatures (Honys and Twell, 2003; Schmid et al., 2005). Principle component analysis of the AtGenExpress data set showed that overall morphological similarity according to organ type is well reflected in eigenvalue distances, whereas developmental stage or environmental conditions are only minor components (Schmid et al., 2005). As a corollary to this, functional gene assignment using gene ontology categories showed that organs that are morphologically similar contain similar functional gene categories. In soybean and Arabidopsis, genes whose products are involved in photosynthesis and carbon dioxide fixation are overrepresented in leaves and underexpressed in tissues, such as the root and pollen (Vodkin et al., 2004; Schmid et al., 2005). Pollen-enriched transcripts in Arabidopsis and rice contain an abundance of genes involved in signaling, vesicle trafficking, the cytoskeleton, and membrane transport (Pina et al., 2005). Higher-resolution expression profiles of specific cell types can reveal further information about an organ. LCM and expression analysis of maize coleoptile tissues revealed that genes involved in the shikimate and phenylpropanoid pathways, which are required for secondary metabolites, such as flavonoids, lignins, pigments, and UV light protectants, are epidermis specific, whereas aquaoporins, metal binding proteins, and genes involved in carbon dioxide fixation are vascular tissue specific (Nakazono et al., 2003). Separation of Arabidopsis guard cell and mesophyll cell tissue revealed that a gene involved in cuticular wax formation, CER2, and an inward rectifying K+ channel are preferentially expressed in guard cells (Leonhardt et al., 2004). Studies such as these can suggest physiological mechanisms of specific cell types within the organ.
Expression along Chromosomes
One of the distinct advantages of microarray analysis is the capacity to perform chromosome-scale transcriptional profiling, which allows detection of regional alterations in transcriptional activity and chromosomal clustering of coexpressed genes. Chromosomal clustering of coexpressed genes within prokaryotic operons is due to the control of a single promoter. However, the biological mechanism of chromosomal clustering observed in Caenorhabditis elegans, humans, yeast, and Drosophila (Cohen et al., 2000; Caron et al., 2001; Boutanaev et al., 2002; Roy et al., 2002; Spellman and Rubin, 2002) remains to be determined. Suggested mechanisms that coordinate clustered gene expression include histone modifications, sharing of common regulatory elements, and large-scale alterations in higher-order chromatin structure (Sproul et al., 2005). Tiling arrays indicate that there are distinct organ-, stimuli-, and developmentally specific regions of transcriptional activity within chromosomes in rice and Arabidopsis.
Chromosome organization and gene expression have been characterized by both microarray and cytological analysis in rice chromosomes 4 and 10 (Jiao et al., 2005; Li et al., 2005). Chromosomes are divided into domains of open chromatin, where genes have the potential to be expressed (euchromatin), and domains of closed chromatin, where genes are not expressed (heterochromatin). The studies by Jiao et al. (2005) and Li et al. (2005) indicated that the long arms of chromosomes 4 and 10 possess more euchromatin than their short arms and are the predominant site of transcriptional activity. Similar results were found with Arabidopsis (Birnbaum et al., 2003; Yamada et al., 2003; Schmid et al., 2005). However, under certain conditions, there may be significant transcriptional activity in the heterochromatic regions of these chromosomes. For example, during the reproductive stage of development, mature plants display increased transcriptional activity near the centromere of chromosome 10 (Jiao et al., 2005). In addition, Li et al. (2005) reported that rice plants exposed to mineral and nutrient stress display increased transcriptional activity in the short heterochromatic arm of chromosome 10. Variations in transcriptional activity across chromosomes are also shown to be correlated with tissue and organ type and developmental stage as well as light and cold stress (Yamada et al., 2003). These data suggest that heterochromatic chromosomal regions are more transcriptionally active and dynamic than previously thought. It is possible that certain developmental and environmental stimuli may aid in transcriptional activation of heterochromatic regions by opening the chromatin structure and propagating a state in which the genome region in question is poised for transcriptional activation (Sproul et al., 2005).
One caveat of chromosome-scale transcriptional profiling is that it can be difficult to find subtle patterns of transcriptional activity that correlate with changes in plant physiology. Detailed characterization of genomic elements in addition to cDNAs and putative open reading frames may reveal novel associations that explain when, how, and why chromosomal clustering occurs in plants. For example, it is thought that transposable elements play a key role in controlling transcriptional activation of heterochromatic regions (Lippman et al., 2004; Jiao et al., 2005). However, spectral analysis of the density profiles of genes and retrotransposons demonstrate that localized fluctuations occurred with high frequency but with low correlation, indicative of few cis-interactions between genes and retrotransposons (Kendal and Suomela, 2005). Further study of higher-order chromatin structure will continue to be important in understanding overall control of gene expression and will further elucidate the causes of regional and local alterations in transcriptional activity.
EXPRESSION CHANGES UNDERLYING DEVELOPMENTAL TRANSITIONS
Developmental biologists seek to understand the mechanisms behind the temporal and spatial events that occur during specification of cells, organs, and developmental states of an organism. The analysis of a developmental time series by expression profiling can, in some cases, indicate a molecular mechanism behind already morphologically or genetically characterized developmental transitions throughout a plant's life cycle. Elucidation of the timing of the maternal-to-zygotic transition in maize (Grimanelli et al., 2005), of transcriptional switches in vasculature formation (Schrader et al., 2004; Kubo et al., 2005), and of the initial steps in the transition from a xylem pole pericycle cell to a lateral root primordium in Arabidopsis (Himanen et al., 2004) have been aided by expression profiling.
The first developmental switch in an organism's life cycle is the maternal-to-zygotic transition whereby an embryo-specific developmental program begins concurrently with extensive programming of gene expression. In a species-specific fashion in animals, the zygotic genome becomes active only after several rounds of cell division after fertilization, and early embryogenesis is largely dependent on maternal transcripts deposited in the egg before fertilization. The study of early embryogenesis is difficult in most sexual plants due to the simultaneous double fertilization events of the egg cell nucleus and the central cell (precursor to endosperm). To circumvent this issue, Grimanelli et al. (2005) performed expression profiling on fertilized and unfertilized ovules and on material undergoing early embryo or endosperm development in an apomictic maize-Tripsacum hybrid. In the sexual development of maize, embryo and endosperm development are coupled, so it is difficult to separate embryo development from endosperm development. Grimanelli et al. (2005) used the apomictic maize-Tripsacum hybrid because in this hybrid, the embryo performs up to five divisions before fertilization and initiation of endosperm development. At the point of fertilization, the embryo remains arrested developmentally, and embryo and endosperm development are uncoupled. Microarray analysis by Grimanelli et al. (2005) in this system revealed no detectable changes in the amount of the total transcript population before and after fertilization. Therefore, in maize, as in animals, an embryo-specific developmental program is delayed after fertilization by several days.
As plants mature, other developmental processes generate discrete tissues and cell types specific to each organ. In plants that undergo secondary growth, the xylem and phloem are produced from periclinal divisions of the meristematic vascular cambium (VC). Although it is common knowledge that cells on the inside of the VC differentiate into phloem and that those on the outside differentiate in the xylem, there has been very little knowledge about the genetic regulatory mechanisms that control this process. Transcriptional profiling has revealed subsets of genes that may regulate cell fate within the vasculature (Schrader et al., 2004; Zhao et al., 2005). Schrader et al. (2004) generated a transcript profile of the cambial zone of aspen with near cell-type-specific resolution. By characterizing the expression of known apical meristem regulators, such as PttCLV1, PttANT, and PttKNOX, within the cambial zone, this study revealed that the shoot apical meristem and the VC may share common regulatory mechanisms. Furthermore, this study found several sets of genes that define the phloem and phloem mother cell region, the phloem-cambium region, and the cambium-xylem and radial expansion region, thus providing substantial molecular data for the division of the cambial zone into distinct layers.
Xylem differentiation and maturation, or xylogenesis, in particular, has been the focus of many studies, including transcriptional profiling (Hertzberg et al., 2001). Xylem vessels are the main component of wood, and as such they provide support and strength to the stem in addition to conducting water, nutrients, and various signaling molecules in plants. Two types of vessels differentiate from the procambium during early plant development. The protoxylem vessels are characterized by annular and spiral thickenings of the cell wall, while the metaxylem vessels are characterized by reticulate and pitted thickenings of the cell wall. Microarray analysis with an in vitro xylem vessel element–inducible system from Arabidopsis suspension cells profiled the process of xylogenesis (Kubo et al., 2005). A family of NAC domain genes, VASCULAR-RELATED NAC-DOMAIN PROTEINS (VND), was shown to be upregulated during xylem vessel formation. Overexpression of two of these genes, VND6 and VND7, resulted in transdifferentiation of various types of cells into xylem vessel elements in hypocotyls and roots of Arabidopsis and in leaves of poplar. Dominant repression of VND6 and VND7 specifically inhibited metaxylem and protoxylem vessel formation. Together, these data suggest that VND6 and VND7 act as transcriptional switches and possible master regulators of metaxylem and protoxylem formation.
Xylem cells provide an as yet unknown signal to associated pericycle cells that results in lateral root primordia formation. Both extrinsic and intrinsic signals act to modulate initiation of lateral root primordia. The initial divisions of these cells result in very small primordia, which are difficult to dissect and can only be detected microscopically. In addition, along the length of the primary root, lateral root initiation is asynchronous, making isolation of a large group of developmentally similar primordia for profiling even more difficult. Recently, Himanen et al. (2004) were able to profile the early stages of lateral root initiation using a lateral root induction system in Arabidopsis. Growth on auxin transport–inhibiting media blocked initiation of lateral roots. Transfer to auxin-containing media allowed for lateral root initiation and provided the material for transcript profiling and the elucidation of four stages that precede the division of the single xylem pole pericycle cell (Himanen et al., 2004). They found that before induction, the pericycle cell exists in a G1 cell cycle arrest as evidenced by low initial expression of several cell cycle genes, including B-type cyclins and the cell cycle inhibitor KRP2. Genes associated with auxin perception and signal transduction are then activated upon transfer to auxin. The cell cycle then progresses through the G1/S transition coincident with a downregulation of genes involved in differentiation. Finally, genes associated with the G2/M transition are induced, coinciding with the acquisition of meristematic identity in the xylem pole pericycle cell as viewed by transmission electron microscopy. This elegant induction system combined with microarray analysis provides insight into the early events in signaling associated with this usually experimentally intractable stage of development.
THE STRESSED-OUT TRANSCRIPTOME
The advent of high-throughput transcript profiling has revolutionized the study of how plants perceive and respond to stress. Microarray studies with a range of ESTs and cDNAs have allowed researchers to expand upon single gene studies.
Disease resistance is an exquisitely coordinated process because it is specific to the developmental stage of the plant and to the identity of the assailing pathogens. Genome transcriptional analyses have clarified many aspects of this specificity. For example, based on studies of single genes, researchers initially theorized that recognition and defense against biotrophic and necrotrophic pathogens occurred through a salicylic acid signal transduction pathway and a jasmonic acid/ethylene pathway, respectively. These two pathways were thought to be antagonistic. However, microarray studies of the effects of these signaling hormones, fungal pathogens, reactive oxygen species, and UV radiation on Arabidopsis indicate that there are extensive interaction and coordination between these pathways and pathways involving carbon metabolism, cell development, and reactive oxygen species synthesis (Schenk et al., 2000; Narusaka et al., 2003). Microarray analysis of phytoalexin-deficient4 and constitutive immunity mutants and chemical conditions that induce or suppress the hypersensitive response and systemic acquired resistance have also illustrated the importance of single genes in coordinating the expression of regulons that lead to disease resistance (Maleck et al., 2000; Narusaka et al., 2003). In addition, while pathogen-induced systemic acquired resistance leads mainly to the upregulation of defense-related genes, age-related resistance includes the upregulation of defense-related genes and genes involved in modifying or strengthening the cell wall to prevent pathogen invasion (Hugot et al., 2004).
Microarray analysis also indicates that there is extensive transcriptional regulation in response to abiotic stresses, such cold, drought, salt stress, high light, wounding, and nutrient deprivation (Kreps et al., 2002; Rossel et al., 2002; Watkinson et al., 2003; Yu and Setter, 2003; Delessert et al., 2004; Murchie et al., 2005). Studies have shown that transcriptome responses to these stresses are different between organs such as roots and leaves (Kreps et al., 2002), seed parts such as placenta and endosperm (Yu and Setter, 2003), and various developmental stages (Murchie et al., 2005). Furthermore, these responses are specific to the level of stress. For example, patterns of expression of <20% of loblolly pine genes are shared between mild and extreme drought stress. In addition, many pine gene family members, such as those of heat shock proteins, laccase, and late embryogenesis abundant protein members, are differentially expressed under different intensities of drought stress (Watkinson et al., 2003), suggesting subfunctionalization of these members.
Deprivation of different nutrients, similar to cold, drought, and salt stress, all initially induce a generalized stress response and then a late, specialized stress response. For instance, a systemic wound response, which includes the expression of genes involved in signal transduction and regulatory factors, occurs immediately after wounding (Delessert et al., 2004). By contrast, suppression of metabolic genes, such as those involved in carbon metabolism and photosynthesis, occurs at the wound site and is part of a localized response that occurs more than 4 h after wounding. The suppression of metabolic genes under stress conditions such as wounding likely indicates changes in the source-sink relations. While these and many other cDNA microarray studies have extended our understanding of transcriptome stress response, future analyses with whole-genome arrays will allow more thorough cataloguing of the transcriptional pathways that mediate stress responses.
TRANSCRIPTIONAL AND METABOLIC NETWORKS
One goal of characterizing and cataloguing transcriptional signatures of organs, cell types, developmental transitions, and responses to environmental stimuli is to integrate this information with that gained from other strategies. These strategies include chromatin immunoprecipitation, phylogenetic analysis, bioinformatic sequence analysis, identification of protein–protein interactions, posttranslational modifications, and correlation with metabolic flux. The integration of these data results in a hierarchy of information that is processed through regulatory networks. Analysis of the design principles of the network can break down this vast amount of information into basic computational elements or network motifs.
Elucidation of the transcriptional code in yeast has allowed motifs that are bound by regulators at high confidence to be mapped on the yeast genome (Harbison et al., 2004). In addition, this map describes transcriptional regulatory potential by including binding data in diverse growth environments. The transcriptional regulatory code is based on direct in vivo interactions, and the description of these interactions is at a much higher level of complexity than a description of overall changes in the transcriptome. The arrangement of binding sites along a promoter provides clues to the regulatory mechanisms of a transcription factor. For example, repetition of a binding site sequence suggests that a repetitive architecture is required for stable binding of the transcription factor or could indicate the potential for a graded transcriptional response. Alternatively, promoter binding sites for specific pairs of regulators occurring frequently within the same promoter regions suggests that these two transcriptional regulators interact physically (Figure 1 ) or act cooperatively. Additionally, analysis of various binding conditions of a transcriptional regulator provides information about the behavior of the regulator. For example, a transcriptional regulator that binds to a core set of target promoters under one condition, but then binds to an expanded set of promoters or an entirely different set of promoters with a change of environment, provides insight into the function of this regulator. Likewise, a transcription factor that binds to a set of promoters independent of growth environment also provides insight into its function.
Elucidation of the Transcriptional Regulatory Code under Drought Conditions.
Clustering analysis of microarrays identifies a series of transcription factors that are upregulated or downregulated in response to mild and severe drought.
(A) Integration of data to determine what upstream regulators act to control the expression of these transcription factors (TF). Bioinformatic analysis, analysis of phylogenetic conservation between upstream regulatory sequences, and transcription factor binding data by chromatin immunoprecipation are integrated to identify transcription factor binding sites present in the upstream sequences of each transcription factor. The resulting binding site architecture provides clues to its regulatory logic. (A) A binding site motif for a single regulator suggests a single input. (B) A repetitive binding site motif suggests increased binding strength of the transcription factor or the potential for a graded transcriptional response. (C) Multiple regulator architecture suggests combinatorial logic. (D) Co-occurring regulator architecture suggests that these transcriptional regulators may interact together.
(B) Analysis of binding conditions provides information about the downstream targets of a regulator and the behavior of a regulator. Microarray analysis coupled to chromatin immunoprecipitation is used to identify targets of two of the transcription factors that were identified as being upregulated under drought conditions. Transcription factor A shows condition-invariant behavior. This transcription factor binds to the same set of promoters in both drought conditions. Transcription factor B shows condition-expanded behavior. The regulator binds to a core set of target promoters under mild drought and binds to an expanded set of promoters under severe drought. (Based on Harbison et al., 2004.)
Characterization of plant transcriptional regulatory codes is still in its infancy. Modulation of transcription factor activity using the glucocorticoid receptor followed by microarray analysis and chromatin immunoprecipitation have further elaborated local transcriptional circuits of LEAFY and AGAMOUS and their downstream targets in Arabidopsis in vegetative-to-reproductive meristem transition and floral organ development (William et al., 2004; Gomez-Mena et al., 2005). In addition, hierarchical clustering of the expression profile of 402 transcription factors over 56 different stress conditions revealed stress-specific gene clusters and enriched binding site motifs in each cluster (Chen et al., 2002). Analysis of binding site motif conservation in promoters of different accessions of Arabidopsis and the effect of polymorphisms on expression demonstrates that the use of phylogenetic analysis to identify conserved binding site motifs is a valid approach in Arabidopsis (Chen et al., 2005). Together, these data represent a starting point in understanding the transcriptional regulatory code of Arabidopsis. However, to obtain a global understanding of the code, we must begin to integrate this information in a tissue-specific manner, incorporating information from both developmental and environmental analyses.
There are a number of visual platforms in Arabidopsis that facilitate integration of expression profiling with biochemical pathway maps (Lange and Ghassemian, 2005). We can use these maps to further enhance our knowledge of regulation of metabolic networks. Although it is well understood that enzymes associated with a functional metabolic unit are often coexpressed, expression profiling can aid in understanding the degree to which transcriptional coexpression regulates control of metabolic flux. In multiple pathways in yeast, coregulation of enzyme expression enhances the linearity of metabolic flux by biasing flux toward a subset of possible routes. In addition, regulation of isozyme expression appears to reduce crosstalk and unwanted interactions between separate metabolic pathways (Ihmels et al., 2004). A similar finding has been demonstrated in Arabidopsis, where the expression of enzymes involved in flavonoid biosynthesis over 563 experimental conditions reflects the reorientation of metabolic flux from flavonol to anthocyanin biosynthesis (Gachon et al., 2005). In angiosperms, large-scale duplications and even genome duplications are common (Bowers et al., 2003), resulting in gene families with multiple members and the possibility of functional specialization of duplicated gene pairs. Redundancy in probe sets of isozymes on the Affymetrix Arabidopsis ATH1 chip limits the study of an isozyme's action. However, an analysis of cytochrome P450s involved in Trp biosynthesis with its paralogs involved in indole glucosinolate biosynthesis showed that these groups cluster together in stress conditions, suggesting a potential single large-scale duplication event encompassing all genes of the pathway (Gachon et al., 2005).
Once the transcriptional regulatory code has been determined, we can begin to study the design of transcriptional regulation networks that control gene expression. A statistical analysis of recurring patterns in a transcriptional regulatory network in Escherichia coli revealed a series of network motifs (Shen-Orr et al., 2002). Network motifs are patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. Analysis of these motifs revealed two trends. First, a representation of the transcriptional regulatory network using network motifs revealed a motif hierarchy. A single layer of dense overlapping regulon motifs feeds into single input module motifs and feedforward loop motifs (Shen-Orr et al., 2002). This first layer of dense overlapping regulon motifs may therefore represent the computational core of the network. Second, each type of motif was determined to have a specific function in determining gene expression, for example, by generating a temporal expression response or by functioning to stoichiometrically form a protein assembly (Shen-Orr et al., 2002). If the general function of these motifs holds across different organisms, then an analysis of network motifs may aid in inferring the function of a particular network module.
MICROARRAYS: A VISION FOR THE FUTURE
Microarrays have revolutionized the characterization of biological processes. However, the generation and analysis of this vast amount of information can still be improved. The use of the Arabidopsis Affymetrix 8K (first generation) and 22K (second generation) chips provided a common platform for data analysis that allowed users from different labs to easily analyze expression data under a wide variety of experimental conditions. Issues with genome coverage and annotation call for a third generation of Arabidopsis Affymetrix chips. The Affymetrix ATH1 22K chip contains 22,500 probe sets representing ∼23,750 genes (Redman et al., 2004). Nonunique probe sets were used to represent highly similar genes. In its most recent annotation, the Arabidopsis genome (The Arabidopsis Information Resource, November 12, 2005) contains 31,407 genes (26,751 protein-coding, 3818 pseudogenes, and 838 noncoding RNA genes). In terms of protein-coding genes, the ATH1 chip is lacking coverage of 3001 genes. At the time of development of the ATH1 array, precedence was given to genes for which either expression evidence or a credible database match existed. The recent increase in availability of genomic and bioinformatics resources has resulted in a great increase of evidence for expression and in database matches. A new version of an Arabidopsis whole-genome chip should include coverage of all genes in addition to coverage of the noncoding RNA genes. Alternatively, a commercial version of the Arabidopsis tiling array where the complete genome is represented, including intergenic regions, would also serve such a purpose (Yamada et al., 2003).
In addition to more detailed microarray platforms for model species such as Arabidopsis, there is an obvious need for transcriptional profiling of crop species and other important plant species. Currently, there are microarray projects for barley, Brassica, citrus, grape, maize, Medicago trunculata, poplar, potato, rice, soybean, sugarcane, tomato, wheat, strawberry, cassava, cacao, and others (Wang et al., 1998; Aharoni et al., 2000; Jones et al., 2002; Lopez et al., 2005; Rensink and Buell, 2005). These projects use a variety of platforms, including Affymetrix, Agilent, spotted cDNAs, and spotted oligonucleotides. As we continue to perform transcriptional profiling on an ever-broadening array of plant species, it will become necessary to completely sequence the genome of a plant and expeditiously convert it to a standardized, commercially available, whole-genome microarray platform. In addition to detecting changes in gene expression, these arrays could also be developed to detect single nucleotide polymorphisms (Wang et al., 1998) and accurately determine transcriptional changes in gene orthologs from related species. These platforms could then be used for intra- and interspecies transcriptional comparisons, thus allowing for easy data integration and, subsequently, functional gene characterization under a variety of developmental and environmental conditions.
However, before that day becomes a reality, researchers must optimize experimental design and fully exploit the use of statistics when identifying differentially regulated genes. Variation is introduced into microarray analysis at many different steps, and good experimental design can account for these sources of variation and help to distinguish true differential responses from noise. Types of variation can include biological variation (genetic or environmental effects on the organism), technical variation (extraction, labeling, and hybridization of samples), and measurement error (signal detection). A number of experimental designs exist that optimize the number of experimental replicates and the number of comparisons between chips. Sample experimental designs include reference design, split-plot, incomplete block, and loop designs (Kerr and Churchill, 2001; Churchill, 2002; Woo et al., 2005). These experimental designs make microarray data amenable to statistical analysis. Often small changes in gene expression have biological relevance, and these small changes are ignored in cases where the threshold for differential gene expression is >2 or 2.5-fold. Rigorous statistical tests can help distinguish these small changes from the noise associated with the microarray experiment. A number of statistical tests can be used to test differential gene expression including the significance analysis of microarrays t test (two conditions with replicates) and the mixed model analysis of variance (more than two conditions) (Cui and Churchill, 2003; Churchill, 2004; Cui et al., 2005). These and other statistical tests should be used more frequently in microarray analysis to allow for confident selection of subtle changes within the genome under a variety of conditions.
CONCLUSIONS
The genomics era is providing us with the tools and capabilities to study biology with a magnitude never before seen. Could Watson, Crick, Wilkins, and Franklin have guessed that within 50 years we would have the tools to determine when and where every gene in an organism is expressed, during any developmental stage, under any environmental condition, in specific organ, tissue, and cell types? As we are on the brink of determining when and where every gene in an organism is expressed under an infinite number of conditions, our next question is, why? Why are genes expressed in a specific location at a specific time under specific conditions? What regulates a gene's expression, and how do these regulators behave? These questions are being answered by cataloguing the relationship between cis-regulatory elements and their transcription factors and determining how different transcription regulons interact under specific developmental and environmental conditions. In addition, as we begin to use a systems approach by integrating genomics, proteomics, and metabolomics with statistics, physics, and mathematics, we continue to refine our views of how the transcriptome gives rise to biological form and function.
Acknowledgments
We thank Kim Gallagher and Jee Jung for critical reading of the manuscript. S.M.B. and T.A.L. contributed equally to this work.