- © 2015 American Society of Plant Biologists. All rights reserved.

## Abstract

Plant biology is rapidly entering an era where we have the ability to conduct intricate studies that investigate how a plant interacts with the entirety of its environment. This requires complex, large studies to measure how plant genotypes simultaneously interact with a diverse array of environmental stimuli. Successful interpretation of the results from these studies requires us to transition away from the traditional standard of conducting an array of pairwise *t* tests toward more general linear modeling structures, such as those provided by the extendable ANOVA framework. In this Perspective, we present arguments for making this transition and illustrate how it will help to avoid incorrect conclusions in factorial interaction studies (genotype × genotype, genotype × treatment, and treatment × treatment, or higher levels of interaction) that are becoming more prevalent in this new era of plant biology.

## IDENTIFYING BIOLOGICAL INTERACTIONS BETWEEN AND AMONG TREATMENTS, GENOTYPES, AND ENVIRONMENTS IS CRITICAL IN PLANT SCIENCE

Testing interactions between and among treatments, genotypes, and environments (Table 1) is central to nearly every field of plant biology, from genetics tests for epistasis, to physiology tests for interactions of multiple treatments. Understanding how results translate from one condition to another requires us to determine how these variables interact with each other in the context of an experiment. These interactions between variables form the basis of integrative studies that aim to assess how genetic variation influences the response to a specific treatment or environment. As a result, numerous plant biology studies require robust statistical methods to test hypotheses about how two variables interact.

## OVERUSE OF STUDENT’S *t* TESTS

Given the ubiquity of testing interactions, plant biologists are naturally well versed in the importance of assessing their data for statistical significance. Unfortunately, the analysis methods used are not always appropriate. A survey of three recent issues of *The Plant Journal* (Vol. 81, issues 1 to 3), *The Plant Cell* (Vol. 26, issues 10 to 12), and *Plant Physiology* (Vol. 166, issue 4, and Vol. 167, issues 1 and 2) showed that 83 of 185 articles (45%) relied solely upon pairwise *t* tests using a single trial of an experiment to analyze quantitative data, with the vast majority of studies involving multiple variables, experiments, or interactions. Of the remaining articles, 42 (23%) presented no quantitative data relevant to the statistics discussed in this article (i.e., modeling results or developmental pictures) and 41 (22%) reported quantitative data for which statistical analysis was either not conducted or not described. Finally, only 19 (10%) combined the data from multiple trials within an ANOVA to directly test for an interaction between two variables. Although we do not suggest that ANOVA would have been the best option in all of the above instances, we feel that these numbers indicate the extent to which ANOVA is underutilized in our community. The *t* test (described below) is familiar to most molecular biologists, is easy to perform and interpret, and is properly applied when there is no interaction among variables or variation across trials. However, given the increasing prevalence of experiments investigating the interaction among variables within a single study, the *t* test does not fully use the power of the experimental data or even directly test the hypothesis that two variables/components interact. Under these conditions, the trial-specific *t* test can deliver false conclusions that could be avoided by combining all the data in an ANOVA.

## LINEAR STATISTICAL MODELS SUCH AS ANOVA PROVIDE MORE POWER TO TEST THE HYPOTHESIS OF INTERACTION

Employing a linear statistical model like ANOVA can allow a researcher to include data from all experiments under analysis and make better use of existing experimental designs that seek to test interactions. Here, we use two simulated data sets to compare statistical analyses of multiple variables across multiple trials using *t* tests and ANOVA. We first show how ANOVA can prevent a false conclusion that would have been drawn if relying on the *t* test and then show how ANOVA can reveal a true conclusion that would have been dismissed by the *t* test. Finally, we illustrate the critical nature of the replication underlying the experiment and the importance of post-hoc graphical analysis of the data. Because all research programs rely on previous results to shape future experiments, preventing errors that can affect further studies will require increased efforts to more accurately model the data. Transitioning to ANOVA and crucially using all available data can help to increase the productivity of a research group by diminishing inferential errors. We aim to demonstrate how changing our standard approach to statistical analysis of interactions could significantly benefit plant biologists beyond these specific instances.

*t* TEST CALCULATIONS

A *t* test assesses if the means of two groups differ; we can choose one of several different variant *t* tests depending upon assumptions about the variances and directionality. In the articles we surveyed, most used a *t* test that assumed equal variances, using the equation . In this equation, is the difference in the means between the two groups being compared (genotypes A and B or treatments 1 and 2), while is the pooled within-group sd and *n* is the number of samples for each group (if the groups have equal sample sizes). The P value is then estimated by comparing the *t* value to the expected distribution of *t* across all experiments of a similar sample size (degrees of freedom), the *t*-distribution. As can be seen from the equation for a *t* test, it cannot be extended to more groups or factors, is limited to a single pairwise comparison at a time, and thus is not designed to test interactions.

### Resources

*t* tests can be run in nearly any worksheet package (such as Excel), statistical package (such as R, SAS, or SPSS), and even a number of calculator applications.

## ANOVA CALCULATIONS

ANOVA is a class of linear statistical models written in a linear algebraic form that is extensible and allows us to specifically test multiple variables and their interactions. For example, to compare genotypes across two treatments with a term for trial, the model to explain the phenotype (*y _{gte}*) for each specific genotype (

*g*), treatment (

*t*), and trial or environment (

*e*), would be written as , where

*μ*represents a constant,

*G*represents the contribution of the genotype (

*g*),

*T*represents the contribution of the treatment (

*t*), and

*E*represents the contribution of the environment (

*e*), incorporating the variation in all of the trials. Finally,

*Ɛ*represents residual error. Thus, y

_{gte}can represent all the phenotypes for every genotype in every treatment in every trial or, more simply, all the data. Another way to think of this is that the model hypothesizes that the phenotype (

*y*) may be altered by variation in the genotypes (

_{gte}*G*) and/or variation in the treatments (

_{g}*T*) and/or variation in the trials (

_{t}*E*) and an interaction of the genotypes with the treatment (

_{e}*G*) plus some error that can’t be controlled (

_{g}x T_{t}*Ɛ*). A researcher can add additional levels for each factor (e.g., multiple levels of a treatment or different mutant alleles of a gene) as well as other factors, and it is thus extendable to new experimental designs.

_{gte}Any variation in phenotype is then partitioned into the contributions made by the various factors and significance is determined by an F-test where. This essentially asks how the variation in the average phenotype between the genotypes in the experiment compares to the random variation in the experiment. We can then determine the P value for this factor’s *F* value by comparison to the *F*-distribution, the predicted distribution of all possible *F* values across all similarly sized experiments. The significant benefit of ANOVA, or other linear models, is that including all the experimental factors that may affect the phenotype allows for a better estimation of the random variation and thus gives more precision to the *F* value than to the corresponding *t* value.

### Extensions

The extensibility of the ANOVA linear model is one of its particular strengths. This includes the ability to add variables and generate different models to directly test a hypothesis. One-way ANOVA tests variation across a single variable, whereas two-way ANOVA assesses two variables and can include their potential interaction. This can be extended further. In addition to multiple variables, the variables can be classed as fixed or random effects. Fixed effects can be considered as variables that can be qualitatively grouped, such as genotypes. In contrast, random effects can be considered as variables that show quantitative variation or possibly a large range of levels that cannot be readily grouped into discrete classes. Both fixed and random effects can be included in the same model, leading to the generation of mixed models.

### Resources

Simple one-way ANOVAs can be run in most worksheet packages such as Excel. However, the worksheet packages contain prepackaged structures that may not match the linear model to be tested. The application of two-way or more complex models requires the use of a publicly available statistical package such as R or a commercially available package such as SAS and SPSS. These statistical packages provide more flexibility and are thus more likely to be suitable for the types of analyses described here. Within R, the basic package allows for simple ANOVA analysis using the aov command (Crawley, 2014; R Development Core Team, 2014). A number of websites have information about basic statistics in R (http://www.statmethods.net/ and http://cran.r-project.org/). These include guides to beginning analysis (http://cran.r-project.org/web/packages/HSAUR/vignettes/Ch_analysis_of_variance.pdf). The basic R package also allows for the use of multivariate approaches like MANOVA. Advanced linear models will require the use of the CAR or LME4 packages, which allow the user to individualize the analysis to properly test the hypothesis using the experimental design command (Fox and Weisberg, 2011; Bates et al., 2014; Crawley, 2014).

## ANOVA PROPERLY TESTS GENETIC INTERACTIONS

Genetic analysis commonly tests if two genes show an epistatic interaction. This can examine, for example, whether two genes in a gene family have overlapping functions or if different genes interact epistatically to control a phenotype by functioning in the same pathway (for examples, see Lynch and Force, 2000; Carlborg et al., 2006; Freeling, 2009; Joseph et al., 2013; Mackay, 2014). For genes with redundant functions, both genes may need to be mutated to produce an altered phenotype. Unfortunately, testing genes for redundancy or epistasis by using pairwise *t* tests to compare mutant to wild type is prone to errors. Performing an ANOVA, which uses the data from all individuals and all trials of the experiment together, can describe the interaction more accurately. To illustrate this, we simulated two genes with a purely additive interaction with each single mutant leading to a reduction of 15 mm in root length in comparison to the wild type and the double mutant having a reduction of 30 mm root length (see Supplemental Methods for the fabricated data sets). Then we built a model that randomly drew samples from a normal distribution with homogenous variance for each class.

Using an ANOVA approach to analyze randomly simulated data from all three independent trials, each with three biological replicates, identified the true underlying genetic model (Figure 1). It showed that each gene has an additive effect but finds no significant support (significance threshold is P < 0.05) for an interaction between the effects of gene 1 and gene 2, indicating no support for epistasis or redundancy of these two genes (Figure 1) (Li et al., 2008). To illustrate how combining all the data into a single analysis increased the power of the test, we also ran ANOVA within each separate trial (Figure 1; Supplemental Table 1). This trial-specific ANOVA made it difficult to draw any conclusion from the study, as the results were inconsistent between trials (Figure 1). The effects of the individual mutants were only significant in some of the tests using independent trials, which could have misled the researcher to conclude that each gene does not affect root length (Figure 1). Yet, the combined ANOVA explicitly showed that each genotype does affect root length.

Assessing reproducibility in the face of experimental variation requires the ability to assess the effect of variation among the trials. In the combined ANOVA, examining the “trial” term (E) revealed no statistical support for the three trials of the experiment being different (Figure 1). Thus, the researcher can conclude that the differences between the trials are not significant. If the trial variation is large, it is possible to use the modeling approach to directly address whether variation across trials interacts with the other terms to assess if the results are not reproducible across the trials. The ability of ANOVA to identify the correct underlying genetic model is further boosted by the inclusion of more data points per genotype in the combined analysis while simultaneously showing that there is no difference between the trials.

The use of *t* tests to compare each mutant to the wild type within each trial led to a different conclusion. The trial-specific *t* tests suggested that only the double mutant differs significantly from the wild type (Figure 1). Thus, using *t* tests would lead the researcher to conclude, incorrectly, that gene 1 and gene 2 are redundant and that the results were similar in three independent trials. The reason the *t* test fails in this instance is that it treats the double mutant separately from the single mutants (i.e., each test in isolation) and thus does not test the effects of the two genotypes on each other, but instead tests the effect of the combined genotypes (the wild type versus the double knockout). As such, in this instance, the lack of a detected effect is not equivalent to the lack of an effect but is instead caused by not directly testing the interaction. We note in this instance that the observation that the three trials were not significantly different would have allowed the researcher to pool the data and conduct *t* tests of the individual genotypes. However, in this scenario, the researcher would not be able to test the interaction between genes. The ANOVA allows the effect of alleles to be tested across all the genotypes and trials and thereby enables the researcher to describe the underlying genetic model.

## ANOVA PROPERLY TESTS A GENOTYPE × TREATMENT INTERACTION

In the previous case study, ANOVA identified the correct underlying mechanistic model while the isolated trial *t* test misled the researcher by statistically suggesting an interaction that was not present biologically. An isolated trial *t* test approach can also lead to incorrect rejection of an interaction hypothesis. If, for example, the treatment response was moderate and the mutant phenotype was not fully penetrant, typical results could resemble those shown in Figure 2 (Supplemental Table 2 and Supplemental Methods) (Kliebenstein et al., 1999, 2002; Li et al., 2014; Taylor-Teeples et al., 2015). To illustrate this example, we modeled how jasmonate application affects glucosinolate accumulation in plants mutated in a single transcription factor that quantitatively controls jasmonate perception (Sønderby et al., 2007; Sønderby et al., 2010; Li et al., 2014). In this case, using *t* tests may lead researchers to assume from the second and third trials that the mutant responded to jasmonate like the wild type, while ignoring the first trial as an outlier. Alternatively, researchers may run additional trials or, worse, conclude that no real effect was detected and drop the entire line of investigation. Combining the data in an ANOVA provides statistical support for the interaction and argues that the transcription factor alters jasmonate responses (Figure 2; Supplemental Table 2). Taken together, Figures 1 and 2 illustrate that it is possible to be both positively and negatively misled by the isolated trial *t* test approach.

## CROSS-CHECK YOUR DATA ANALYSIS WITH VISUALIZATION

In both case studies discussed above, the ANOVA allowed the researcher to state whether or not they have evidence that their treatments, genotypes, and/or interactions influence their phenotype. However, the ANOVA does not provide visualization of the data that can enable a researcher to develop a model or hypothesis (for example, how a mutation affects root length). Thus, at this stage of analysis, it is very useful for a researcher to visualize the results by plotting the group means along with all of the individual data points to assess if the ANOVA and visual interpretation coalesce into a single conclusion. This visualization is easily conducted with the aid of bean or violin plots within R (Hintze and Nelson, 1998; Kampstra, 2008). Further analyses can include a comparison of the group means using post-hoc tests such as Tukey’s or Welch’s mean comparisons. Occasionally, the visual and statistical analyses do not agree. For instance, in trial 3 in Figure 2, the individual trial ANOVA provides no statistical support for an interaction but the means suggest an interaction. In such situations, exploring the data further can identify potential outliers due to technical errors such as genotype misclassification or misplanting. Additionally, further replicate trials should be conducted and/or the underlying hypothesis adjusted until the visualization and statistical analyses do agree. While it would ideally be possible to perform unlimited replicates to test all possible alternative hypotheses, this is not always the case given the numerous constraints on research. Thus, at a minimum, authors should fully describe any potential discrepancies or issues arising from comparisons of the visualization and statistical analyses. Finally, we note that all statistical approaches are susceptible to errors and as such the only way to cross-validate any result is to conduct an independent line of inquiry.

## INDEPENDENT REPLICATION IS THE FOUNDATION OF ANY SUCCESSFUL HYPOTHESIS TEST

Appropriate and adequate independent replication within a biological experiment is essential to the successful use of any of the above approaches (Hurlbert, 1984; Vaux et al., 2012; Buttigieg and Ramette, 2014). A general reading of plant molecular literature shows that the most common strategy is to conduct three repeats in three different trials, although the nature of these biological repeats can vary widely. For example, a biological replicate may mean phenotyping three individuals per genotype per trial. The results from these three separate trials are typically analyzed independently, and one trial is shown in a figure with the researcher stating that similar results were found in other trials. This classical experimental design arose largely from molecular biological studies. A simple improvement is to double the replication from three to six or more individuals per trial if at all possible because increased levels of replication provide more power to any statistical approach.

In addition, the replicates performed need to be independent, meaning that there should be as little connection as possible between the samples. For example, leaves from different plants of a single genotype are more biologically independent from each other than leaves collected from the same plant (Schmid et al., 2003; Schmid et al., 2005). Measuring multiple leaves for a given plant increases accuracy in measuring the phenotype of that specific individual, but could be highly misleading if that individual plant is not representative of a genotype because of location, pathogen infection, or another factor. Thus, instead of repeatedly sampling the same individual, it is better to sample multiple individuals to minimize the potential for bias in the analysis. Even more independent are leaves from different plants where the seeds were obtained from different mothers. The researchers need to carefully determine what may influence a measurement and work to randomize sampling, or control the influences that they are not testing, to ensure that all replicates are as independent as possible (Elwell et al., 2011).

## SUMMARY

The case studies presented above illustrate how *t* tests or trial-specific analyses can mislead a researcher. These examples only provide a small sample of all the possible instances when trial-specific pairwise *t* tests might lead to incorrect descriptions of the underlying mechanistic model. As a simple, but by no means all-inclusive, solution, we urge that researchers use ANOVA or other linear models (e.g., mixed models or random effect models) to directly test interactions and to include all the data in a single model. While there are many books or reviews that can be useful to learn and implement these statistical approaches, a readily available resource on this topic may be our colleagues in ecology, evolution, quantitative genetics, statistics, or mathematics, who can rapidly provide the necessary guidance to conduct the suggested analyses. The resources suggested here may also be a starting point for those new to this type of statistics. No matter how we get there, we anticipate that a shift in the field to employing linear models that include all the experimental data will help to improve the community’s interpretation of experiments and generation of hypotheses.

### Supplemental Data

**Supplemental Table 1.**Model of when*t*tests improperly inform about how two genes may or may not interact.**Supplemental Table 2.**Model of when combined data resolves*t*test confusion.**Supplemental Table 3.**Data from Figure 1.**Supplemental Table 4.**Data from Figure 2.**Supplemental Methods.**Journal analysis and computational models.

## Acknowledgments

We thank the following people for comments and suggestions that greatly improved this article: Fumiaki Katagiri, Stefan Schmollinger, Detlef Weigel, Cynthia Weinig, and additional anonymous reviewers. We apologize for the suggested topics that we were unable to include in this review because of editorial focus.

## AUTHOR CONTRIBUTIONS

All authors contributed to the writing. D.J.K. conducted statistical modelling.

## Footnotes

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Daniel J. Kliebenstein (kliebenstein{at}ucdavis.edu).

- Received March 17, 2015.
- Revised May 20, 2015.
- Accepted July 17, 2015.
- Published July 28, 2015.