- © 2016 American Society of Plant Biologists. All rights reserved.
Colleagues from the medical field have estimated that up to one-third of cell lines are contaminated with other cell lines or are misidentified; in addition, repeated passaging substantially changes cell line properties (reviewed in Hughes et al., 2007). The medical community has therefore begun to establish standards for verification of cell lines and genetic stocks, and NIH has announced efforts to require validation and to aid researchers in validating their biological material (Lorsch et al., 2014). Plant biologists should do the same. Even though the propagation of seed stocks cannot be directly compared with animal cell culture, contamination is a real possibility, and it is not uncommon that the same genetic stock produces different phenotypes in different laboratories. Confirming the genetic identity of research material is necessary to know whether such phenotypic differences reflect gene-by-environment (GxE) interactions or whether they are simply due to apples being compared to oranges.
There are two principal areas of concern in this regard. The first one is that transgenic lines, mutants, or accessions are not what they are supposed to be; that is, they carry a different transgene, allele, or mutation, or they are a different accession than assumed. Likely sources are inadvertent seeding of soil, mislabeling of plants, and mix-ups during seed collection. Those who work with self-fertilizing species such as Arabidopsis thaliana must be aware of the risk of outcrossing. In the field, the rate of outcrossing is often a few percent (Bergelson et al., 1998; Bomblies et al., 2010; Platt et al., 2010). In addition, specific genetic backgrounds with altered floral morphologies or reduced male fertility can greatly increase outcrossing rates (Peng et al., 2006; Luo and Widmer, 2013).
Systematic analyses have indicated the scope of these issues. For example, genotyping with single nucleotide polymorphism markers revealed that up to 5% of Arabidopsis accessions in the stock centers were mislabeled (Anastasio et al., 2011; Simon et al., 2012). Another example comes from grapevine (Vitis vinifera), where an inbred Pinot Noir derivative was targeted for genome sequencing, but it was not noticed until later that there was either an uncontrolled outcrossing event, or a complete mix-up, so that in the end a Pinot Noir × Helfensteiner hybrid, or possibly a selfed Helfensteiner derivative, was sequenced (Jaillon et al., 2007). Similarly, wild-type strains of Chlamydomonas reinhardtii have complex histories, with spontaneous mutations and several examples of misidentification (Gallaher et al., 2015). In at least one case, a mutant line was more closely related to other wild-type lines than its supposed isogenic parent (Blaby et al., 2013). There are also plenty of anecdotes of mix-ups of T-DNA insertion lines in Arabidopsis; one published example is a line for which a specific, published insertion subsequently could not be detected again (Richter et al., 2010).
A range of issues across the entire research and breeding community affects the consistent preservation of germplasm identity. The visually obvious hybrid vigor of outcrossed progeny in maize (Zea mays) or distinct morphological defects in many mutants help reduce the problem, but genetic contamination or mix-ups may occur easily, especially if one is not familiar with subtle phenotypic variation. The standard approaches for maintaining germplasm identity in most model species are good, but even if errors are reduced to 1% per generation, roughly 10% of stocks will be mislabeled after 10 generations without genetic validation, which should be common practice.
A second area of concern is that phenotypic effects can be due to additional, unrecognized genetic variants segregating in a stock. Initially pure genetic lines can diverge through secondary, spontaneous mutations, while progeny of lines that were initially not pure can diverge through fixation of segregating variation. One such example is the Arabidopsis accession Landsberg erecta, where one set of stocks being used in the community has a hua2 mutation, while another set of lines does not (Doyle et al., 2005). The origin of this mutation is unclear. There are also several examples of spontaneous mutations in Arabidopsis accessions identified by their phenotypic effects (Loudet et al., 2008; Laitinen et al., 2010), including one that arose in the Col-0 reference strain (Coustham et al., 2014). Hundreds of mutations have been documented in another Col-0 lab stock by whole-genome sequencing (Cubillos et al., 2014), and thousands of single base pair mutations along with several new transposon insertions have apparently occurred since a standard Chlamydomonas line was established in the laboratory (Gallaher et al., 2015). In Arabidopsis, genome-wide mutation rates are ∼1 per generation for single nucleotide polymorphisms and 0.5 per generation for indels (Ossowski et al., 2010; Jiang et al., 2014). Guidelines for how many differences are deemed acceptable before a stock is given a distinct name or identifier are needed and should be developed by each community.
An example of fixation of segregating variants is the soybean reference cultivar Williams 82. Several different lines are in use, apparently because Williams 82 was distributed before it was fully inbred and thus completely homozygous throughout the genome (Haun et al., 2011). Residual heterozygosity giving rise to different sublines among inbreds with the same name has also been described in maize (Romay et al., 2013). This even extends to the maize reference genome B73, where 1.5 Mb (0.07%) of the genome differs between sublines that are nearly 50 years old now (Gore et al., 2009). Similarly, it has only recently been recognized, based on whole-genome resequencing, that many common Chlamydomonas laboratory strains are segregants apparently derived from a single cross (Gallaher et al., 2015).
A related problem is that phenotypes are attributed to a known mutation in a particular stock, but these are instead caused by second-site mutations; several such examples from Arabidopsis are listed in Table 1. Similarly, insufficient characterization of the exact molecular defect at a mutant locus can lead to confusion, for example, when a mutation is more complex than initially realized and affects more than one gene. This was the case for tomato (Solanum lycopersicum) fas, Arabidopsis abp1-1, and Chlamydomonas sta6 mutants, where at least some aspects of the reported phenotypes were caused by genes adjacent to the ones initially thought to be responsible (Blaby et al., 2013; Dai et al., 2015; Xu et al., 2015). These problems are not restricted to mutant lines; a trisomic Arabidopsis stock that was otherwise considered wild type turned out to have a complex history involving introgression of a mutation into a different background, with that mutation explaining an important aspect of the phenotype (Salomé and Weigel, 2015).
Second-Site Mutations Responsible for Major Phenotypes in Arabidopsis
Because of these issues, we suggest a series of best practices for verifying both the identity of genetic stocks and the causal relationship between a genetic variant and a phenotype.
1. State in an article as clearly and exactly as possible the origin of mutants, transgenics, and accessions, including genetic stocks that were used as background for random mutagenesis, genome editing, or generation of transgene insertions. This should minimally include the stock center or the scientist and laboratory who donated the stock and preferably also the approximate date when the stock was acquired and, if available, seed lot information.
2. State if and how the identity of genetic stocks and the effects of specific mutations were verified. For example, if no further molecular or genetic validation was undertaken, please state this, potentially with an appropriate reason (example: “root phenotype is diagnostic for this mutant”). Examples of genetic validation that can link a phenotype to a specific genetic variant include backcrossing and cosegregation analysis, transgenic complementation, examination of a second allele, and recreation of a mutant using gene editing. In the case of mutants, best practice is often to phenotype a segregating population, using homozygous wild-type siblings as control, to account for genetic background differences that may not be represented in the original parent line. Examples of molecular validation include PCR amplification of transgene sequences (O’Malley et al., 2015), targeted analysis of selected polymorphic markers, array-based genotyping or genotyping-by-sequencing (Salathia et al., 2007; Anastasio et al., 2011; Elshire et al., 2011; Simon et al., 2012), shallow resequencing, or even complete resequencing. Ensure that seeds of key stocks for a published article are saved, and consider resubmitting such stocks to stock centers.
While large-scale genotyping or resequencing is not yet practical as a standard, such efforts are already going on for the accessions from the Arabidopsis 1001 Genomes project, in conjunction with curation of the 1001 Genomes collection of accessions by the ABRC, similar to large-scale genotyping efforts at the U.S. national maize inbred seed bank (Romay et al., 2013). Each community focused on a specific organism should also consider how to empower all laboratories, regardless of their size, resources, and bioinformatics sophistication, to harness the power of ever cheaper genome sequencing and genotyping for stock validation, along the lines suggested by NIH for vertebrate cell lines (Lorsch et al., 2014). A key role should be played by stock centers, which should implement routines for validating newly submitted as well as propagated stocks and which may consider offering services that go beyond their core species and provide access to colleagues working with less popular, non-model organisms. Useful in this regard are Web resources that allow facile matching of patterns of genome-wide polymorphisms found in a specific stock with a database of sequenced genomes, as they have been implemented for Chlamydomonas (https://bitbucket.org/gallaher/custom-chlamy-generator) and Arabidopsis (http://tools.1001genomes.org/strain_id/). Other communities are encouraged to develop similar resources.
Of course, efforts must be commensurate with the number of lines analyzed or the goal of a specific experiment. That is, greater care needs to be taken in a study that primarily focuses on a single genetic stock than, say, a GWAS study with hundreds or thousands of lines. In the latter case, an estimate of the number of misidentified lines should suffice, e.g., by genotyping a subsample. As another example, a mutant that has an easily recognized, distinctive phenotype and that is being used merely for a complementation cross needs to be less rigorously validated than a mutant that is used for detailed phenotypic comparison with an isogenic wild-type control.
We would like to emphasize that spending a few hundred dollars and a week or so on validating genetic stocks is appropriate, since experimental articles typically report on research that has been performed over years and has cost tens to hundreds of thousands of dollars (including salaries). Of course, all of us should expend more effort to avoid mix-ups in the first place, for example, by using better practices, such as strict separation of areas for seed production and harvest from areas for sowing seeds.
Acknowledgments
We thank the following colleagues for pointing us to specific cases of concern and relevant articles: Claude Becker and Ignacio Rubio-Somoza, Max Planck Institute for Developmental Biology; Brian Dilkes, Purdue University; Yuval Eshed, Weizmann Institute; Elizabeth Haswell, Washington University; David Jackson, Cold Spring Harbor Laboratory; Dan Kliebenstein, UC Davis; Aaron Liston, Oregon State University; Olivier Loudet, INRA; Sabeeha Merchant, UCLA; Jason Reed, UNC Chapel Hill; Paul Schulze-Lefert, Max Planck Institute for Plant Breeding Research; Karin Schumacher, University of Heidelberg; Nathan Springer, University of Minnesota; Norman Warthmann, Australian National University; and Yunde Zhao, UC San Diego. We thank Nancy Eckardt along with The Plant Cell editorial leadership and five anonymous reviewers for sharpening the focus of this letter.
AUTHOR CONTRIBUTIONS
J.B., E.S.B., J.R.E., M.N., and D.W. conceived and wrote the article.
Footnotes
↵[OPEN] Articles can be viewed online without a subscription.
- Received June 5, 2015.
- Revised February 25, 2016.
- Accepted March 8, 2016.
- Published March 8, 2016.