- American Society of Plant Biologists
Since DNA microarrays came on the scene nearly 10 years ago, the scientific community has struggled to come to grips with this powerful technology. We have seen an expanding range of chip design and improvements in signal detection and data analysis, the creation of public databases and the Microarray Gene Expression Data Society (http://www.mged.org/), and the promotion of community standards for conducting microarray experiments (Brazma et al., 2002). One question that continues to challenge researchers daring to enter the microarray fray is, how do we get meaningful biological information from a minimum of hybridizations, given the high cost of performing these experiments?
Small scale microarray experiments with little or no biological replication or sophisticated statistical analysis have been used successfully to generate hypotheses about gene function that are then tested and verified by independent means (a good example of this is provided in Mele et al., 2003). It is nonetheless increasingly recognized that rigorous experimental design and statistical analysis, together with adequate replication, is critical for studies that seek to draw broad conclusions about the biology of a system (such as plant development and environmental responses) based principally on microarray gene expression data (Meyers et al., 2004). However, the development of statistical tools for microarray analysis has lagged behind the development and application of microarray technology itself, which has made it even more difficult for researchers to determine what constitutes adequate experimental design and analysis.
Results from microarray experiments have the typical characteristics of data that calls for statistically based experimental design and analysis; namely, huge amounts of data associated with potentially large amounts of variation from multiple and diverse sources (i.e., there are numerous sources of both technical and biological variation). However, microarray data differs from data associated with most other types of conventional statistical experiments in a critical respect that has challenged statisticians to develop new analytical tools (or to impose new twists on old analytical tools). That is, conventional statistics typically deals with many observations (e.g., thousands) made on relatively few parameters (e.g., several), whereas microarray data deals with relatively few observations (in many cases, very few) made on many thousands of parameters (e.g., thousands of genes per chip). Happily, many in the growing fields of biostatistics and bioinformatics have stepped forward to meet this challenge, and there are now numerous analytical methods for the treatment of microarray data (Wolfinger et al., 2001; Draghici, 2002; Craig et al., 2003; Cui and Churchill, 2003; Draghici et al., 2003; Storey and Tibshirani, 2003). There is no longer an excuse for the failure to apply rigorous statistical tools in the design and analysis of microarray and other large scale “-omic” (e.g., genomic, proteomic, metabolomic, etc.) data sets.
Still, it is important to realize that there is no single best way of designing and analyzing genome-wide expression experiments. What is important is that researchers provide clear descriptions of their experimental design, methodology, and analysis. A clear distinction should be made between technical and biological replication, and the best analyses strive to account for the major sources of both. In this issue of The Plant Cell, Caldo et al. (pages 2514–2528) provide an excellent example of a well designed and analyzed set of transcript profiling experiments that give us valuable new resources as well as novel insights into the expression of plant defense responses. The authors made use of the Affymetrix Barley1 GeneChip containing probes sets corresponding to ∼22,000 barley genes (Close et al., 2004) and designed a set of experiments first to identify genes whose expression patterns might be used to distinguish incompatibility from compatibility in barley powdery mildew interactions and second to identify genes whose expression patterns might distinguish Rar1-dependent versus Rar1-independent incompatible responses specified by different barley Mla alleles.
The biotrophic fungus Blumeria graminis f. sp hordei (Bgh) is the causal agent of powdery mildew disease in barley (Hordeum vulgare). Compatible (the successful establishment of the fungus and development of disease) versus incompatible (ultimately leading to the termination of fungal growth and disease resistance) interactions are specified by a classic gene-for-gene mechanism involving genes designated Ml (Collins et al., 2002). Approximately 30 distinct disease resistance specificities have been identified at the Mla locus in barley (Jørgensen, 1994), and several alleles have been cloned that encode coiled-coil, nucleotide binding site, leucine-rich repeat resistance (R) proteins (reviewed in Jones, 2001; Shen et al., 2003; Halterman and Wise, 2004). An interesting feature of Mla-mediated disease resistance is that more than 90% similar R proteins encoded by different Mla alleles may or may not require RAR1 and SGT1, which form part of an SCF ubiquitin ligase complex, to activate downstream components (Azevedo et al., 2002; Shen et al., 2003; Halterman and Wise, 2004).
Caldo et al. used a 3 × 2 matrix design consisting of three near-isogenic barley lines, each containing an introgressed Mla1, Mla6, or Mla13 allele, challenged separately with two Bgh isolates: Bgh 5874 (containing AvrMla1 and AvrMla6) and Bgh K1 (containing AvrMla1 and AvrMla13). This provided for six different host–pathogen combinations that encompassed two different Rar1-dependent incompatible versus compatible interactions (determined by the presence or absence of Mla6 or Mla13) and two Rar1-independent incompatible interactions (specified by Mla1). For each of the six host–pathogen combinations, 10 to 15 barley seedlings were harvested and pooled for RNA isolation at six time points (0, 8, 16, 20, 24, and 32 h after inoculation). The entire experiment was then repeated three times in a standard split-split-plot design (Kuehl, 2000), with replications (3) as blocks, Bgh isolate (2) as the whole plot factor, plant genotype (3) as the split-plot factor, and time point (6) as the split-split-plot factor, for a total of 108 GeneChip hybridizations.
The strength of this work comes not solely from the amount of replication and total number of hybridizations performed, but even more importantly, from the careful experimental design that accounted for changes in gene expression over time and differences in host–pathogen interactions depending on plant and fungal genotype. The authors offer clear descriptions of the design, analysis, and the rationale behind their approach. They point out that this particular analysis may have missed some biologically relevant genes (i.e., that are differentially expressed only at one or two time points or only in association with specific host–fungal isolate combinations) but hypothesize that genes showing differential expression patterns across time (i.e., within the 32 h after inoculation) and across different host–isolate combinations offer the most potential for differentiating between compatible and incompatible interactions and determining key components of pathways underlying the molecular basis of these interactions.
In one of several different analyses reported in the article, the authors compared gene expression in the Rar1-dependent incompatible and compatible interactions and identified a set of 22 genes whose average 0 to 32 h time-course expression patterns differed significantly. Examination of the individual gene expression profiles showed that expression of these genes was induced in a similar or identical manner in both compatible and incompatible interactions up to 16 h after inoculation and diverged significantly in compatible relative to incompatible interactions after 16 h (see figure). This time point closely coincides with the time that Bgh haustoria make contact with the host cell plasma membrane after initial infection and thus is consistent with the kinetics of Bgh infection. Expression of this set of genes then decreased steadily from 16 to 32 h after inoculation in the compatible interactions, whereas it generally increased or remained steady in the incompatible reactions. Of the 22 genes in this set, seven were predicted to encode components of the shikimate pathway leading to biosynthesis of phenylpropanoid phytoalexins and lignin, 10 had predicted functions in cellular metabolism, oxidative stress, and ethylene biosynthesis, and five corresponded to proteins of unknown function.
Parallel Transcript Profiling Reveals a Link between Basal and Gene-Specific Resistance.
Top left: general reciprocal pattern of expression in Mla-specified incompatible (solid line) and compatible (dashed line) interactions. Top right: differentially expressed genes modeled onto the last steps of the shikimate pathway leading to the synthesis of defense related compounds. Bottom left: modified quadratic check of barley infected with the powdery mildew fungus Blumeria graminis. Background: raw GeneChip probe array data.
These observations are consistent with the notion that this set of genes encodes proteins that are involved in basal nonspecific defense pathways and that recognition of avirulence factors in the incompatible interaction leads to the maintenance of increased expression, whereas the compatible interaction leads to active suppression of these genes and the associated nonspecific defense responses. Based on the coordinated suppression of defense-related transcripts, the authors hypothesize that the targets of virulence factors are possible regulators of basal resistance. Taken together, these principles also are consistent with the guard hypothesis of R gene function, which postulates that R proteins guard other host proteins that are the targets of avirulence factors (which, in the absence of specific recognition by the host, play a role in virulence of the pathogen) (Van der Biezen and Jones, 1998; Dangl and Jones, 2001). Although many studies have been conducted on plant perception of pathogen-derived molecules, the connection between the recognition of general and specific elicitors in the expression of compatibility and incompatibility responses are just beginning to be elucidated (Navarro et al., 2004). From these results, the authors propose a model that links the recognition of general elicitors and specific avirulence proteins in the expression of plant defense responses, supporting the hypothesis that host-specific resistance evolved from the recognition and prevention of the pathogen's suppression of plant basal defense.
In addition to the specific conclusions derived from this analysis, the complete data set from these experiments is publicly available in the BarleyBase repository for cereal GeneChip data (http://barleybase.org/), as well as in ArrayExpress (http://www.ebi.ac.uk/arrayexpress/), providing an invaluable resource for the scientific community.