- © 2016 American Society of Plant Biologists. All rights reserved.
Abstract
Transposable elements (TEs) are mobile units of DNA that comprise large portions of plant genomes. Besides creating mutations via transposition and contributing to genome size, TEs play key roles in chromosome architecture and gene regulation. TE activity is repressed by overlapping mechanisms of chromatin condensation, epigenetic transcriptional silencing, and targeting by small interfering RNAs. The specific regulation of different TEs, as well as their different roles in chromosome architecture and gene regulation, is specified by where on the chromosome the TE is located: near a gene, within a gene, in a pericentromere/TE island, or at the centromere core. In this Review, we investigate the silencing mechanisms responsible for inhibiting TE activity for each of these chromosomal contexts, emphasizing that chromosomal location is the first rule dictating the specific regulation of each TE.
INTRODUCTION
Plant transposable elements (TEs) can exist in multiple epigenetic states, switching from a transcriptionally active state, which can lead to transposition, to a silenced state that lacks transcriptional initiation of mRNA transcripts. This silenced state is epigenetically heritable across both mitotic cell divisions and trans-generationally though meiosis, fertilization, and embryogenesis. Individual TE families have their own modes of regulation, and examples of TE-specific silencing mechanisms are abundant in the literature. At the same time, generalizations of TE structure, regulation, and function can be inferred solely based on the location of the TE in the genome. Analogous to an ecological study, where some species are endemic to only wetland, desert, or forest biomes, TEs target and reside in distinct niches in the genome. The mode of TE silencing in these niches often supersedes the distinct regulation of each individual TE family. Thus, when investigating the regulation of any particular TE, the first question asked is always about the genomic context of that TE. This Review dissects regulation based on the chromosomal context of TEs, which we separate into four broad categories: near a gene, within a gene, in a pericentromere/TE island, or at the centromere core (Figure 1).
The Four TE Chromosomal Contexts Discussed in This Review.
TEs are shown as red, pink, and orange triangles, gene exons as green boxes, and tandem repeats as yellow triangles.
TEs NEAR GENES
The best studied TEs in plant genomes reside near genes, as these TEs often influence gene regulation. TEs near genes tend to be small nonautonomous DNA transposons, particularly Mutator, hAT, and Helitron family TEs and miniature TE derivatives called MITEs or SMARTs (Jiang and Wessler, 2001; Gao et al., 2012), although some full-length LTR retrotransposon families such as Evadé and ONSEN also target genic regions (Mirouze et al., 2009; Ito et al., 2011). Although a nonautonomous element may itself be transcriptionally silenced, this TE can still be transposed if there is an active autonomous (TE protein-producing and self-transposing) family member elsewhere in the genome. Upon transposition, many TEs target regions near genes (Dietrich et al., 2002). This targeting may be due to an insertion site preference into an open or relaxed chromatin state (euchromatin) associated with actively transcribed genes (Liu et al., 2009). In Arabidopsis thaliana, TEs near genes are the standard targets consistently investigated to determine if a mutation or treatment affects silencing: SimpleHat2, AtRep2, AtSN1, MeaISR, and the rearranged soloLTR IG/LINE locus (Zilberman et al., 2003; Huettel et al., 2006; Zheng et al., 2007; Havecker et al., 2010). The near-gene locations provide the ability to assay these individual TEs in a locus specific single-copy manner, as deep sequencing reads or flanking PCR primers are unique to a single position in the genome. This ability has led to a deep functional and mechanistic understanding of the epigenetic regulation of TEs near genes in Arabidopsis. Therefore, overall mechanisms and models of TE silencing are biased toward an Arabidopsis-centric view of TEs near genes. Although these models create a framework for investigation and are generally supported in different species, some data suggest that the regulation of TEs is not identical in all plant species (Gent et al., 2014; Willing et al., 2015).
TEs located near genes are targeted for RNA-directed DNA methylation (RdDM) through their existing association with DNA methylation and the histone 3 lysine 9 dimethylation (H3K9me2) modification, which represses the initiation of mRNA transcription. H3K9me2 recruits the plant-specific RNA Polymerase IV (Pol IV), a Pol II derivative, to transcribe these TEs into non-protein-coding transcripts that are immediately converted into double-stranded RNA via the RNA-dependent RNA Polymerase 2 (RDR2) protein (Figure 2A) (Law et al., 2013). The short double-stranded RNAs from the Pol IV/RDR2 complex are cleaved into 24-nucleotide small interfering RNAs (siRNAs) by DICER-LIKE3 (DCL3), and these siRNAs are incorporated into either ARGONAUTE4 (AGO4) or AGO6 to drive the RdDM of matching regions of the genome (for additional details on RdDM, see Matzke and Mosher, 2014). Although it is well established that 24-nucleotide siRNAs can target RdDM, recently other types of siRNAs have also been implicated in directing RdDM. Diced 21- to 22-nucleotide TE siRNAs incorporate into AGO6 to target RdDM to active TEs (McCue et al., 2015), and when the Dicer proteins are absent, Dicer-independent longer RNAs can incorporate into AGO4 and then be subsequently trimmed to the 21- to 24-nucleotide size (Ye et al., 2016). Targeting of RdDM acts through an RNA scaffold at the target locus produced by the plant-specific Pol V, which transcribes methylated regions (Figure 2A; Johnson et al., 2014). RdDM occurs through the methyltransferase proteins DRM1 and DRM2 at CG, CHG, and CHH cytosine contexts (where H = A, T, or C) (Cao and Jacobsen, 2002).
RdDM and Maintenance TE Silencing Mechanisms.
(A) TEs near and within genes are targeted by RdDM. RNA Pol IV is recruited to TE templates via H3K9me2 (Law et al., 2013). Pol V is recruited to TEs via DNA methylation (Johnson et al., 2014; Liu et al., 2014). The Pol IV/RDR2 complex generates double-stranded RNAs that are cleaved by DCL3 into 24-nucleotide siRNAs. These 24-nucleotide siRNAs are incorporated into and guide an AGO family protein to target the Pol V scaffold transcript, resulting in DRM2-mediated DNA methylation of cytosines in all sequence contexts. See text for details.
(B) TEs are maintained in a silenced state by maintenance DNA methyltransferases (MET1, CMT3, and CMT2). MET1 acts after DNA synthesis to propagate CG methylation to the new DNA strand. CMT3 and CMT2 operate in a positive feedback loop of H3K9me2 and CHG/CHH context DNA methylation with the histone methyltransferase proteins KYP/SUVH4, SUVH5, and SUVH6. Protein sizes not to scale.
The initial methylation established by RdDM is retained and enhanced through cell division by maintenance methylation mechanisms and the formation of heterochromatin. TE heterochromatin is molecularly defined in plants as containing H3K9me2 and DNA methylation, as well as the association of several other posttranslational histone modifications (Roudier et al., 2011). These chromatin marks result in the tight compaction of nucleosomes and silencing of mRNA transcript initiation (though transcriptional read-through can still occur). Maintenance methylation of DNA is performed by a different methyltransferase for each cytosine context (reviewed in Du et al., 2015). The MET1 protein (a homolog of DNMT1 in mammals) acts after DNA synthesis to methylate CG sites guided by the partially or hemimethylated CG status of the DNA (Figure 2B). The CMT3 and CMT2 proteins are recruited to TEs via the H3K9me2 mark and methylate CHG or CHH sites, respectively (Figure 2B). Methylation of TEs also results in further deposition of H3K9me2 by the KYP (also known as SUVH4), SUVH5, and SUVH6 histone methyltransferases. KYP is recruited to TEs via CHG DNA methylation (Johnson et al., 2007). Thus, once established at a TE locus, DNA methylation and H3K9me2 are stably propagated via CHG methylation though CMT3 and the subsequent recruitment of KYP (Figure 2B). This generates a feed-forward loop of chromatin modification via maintenance methyltransferases, histone modification, and RdDM, which function together to form heterochromatin and silence the transcriptional initiation of TE mRNAs.
One long-standing question is why RdDM is constantly required at near-gene TEs for their silencing. After establishment of methylation by RdDM, a cycle of CMT3 maintenance methylation and H3K9me2 alone should propagate silencing. Recently, data from several groups have demonstrated that RdDM is targeted most strongly to the flanks of TEs, suggesting that RdDM’s role is to reinforce the boundary between TE and non-TE, or more broadly between genic euchromatin and TE heterochromatin (Zemach et al., 2013; Stroud et al., 2014; Li et al., 2015). In this sense, RdDM may serve as a boundary force, constantly buffering and redefining the line between chromatin states so the active chromatin state of a nearby euchromatic gene does not influence or spread into a neighboring TE, resulting in TE expression. At the same time, RdDM and H3K9me2 can spread from a TE into neighboring genes or enhancer elements resulting in genic influence, and an opposing force restricting the spread of RdDM out of a TE must insulate genes from the neighboring islands of heterochromatin that near-gene TEs create (Figure 3A).
Silencing at TE Edges.
(A) The KYP/CMT cycle and/or RdDM function to spread H3K9me2 and cytosine DNA methylation outward from TEs toward neighboring genes. IBM1 and ROS1 remove methylation from H3K9me2 and DNA, respectively, stopping their spread into surrounding genes. JMJ14, LDL1, and LDL2 remove the genic mark of H3K4 methylation out of the silenced TEs. These mechanisms produce a sharp euchromatin-to-heterochromatin boundary or edge compared with the interior of the TE or flanking gene, represented here by the CHH methylation profile (green) of the TE and surrounding region.
(B) The methylstat model of ROS1 regulation. The RdDM of an upstream TE spreads into the Ros1 promoter region. This spread of methylation activates Ros1 expression. The ROS1 protein prunes DNA methylation from its own promoter and limits the spread of methylation out of the TE. In this manner, ROS1 regulates its own expression and can potentially gauge and respond to the genome-wide activity level of RdDM.
Plants require safeguard mechanisms to prevent unrestrained methylation spread out of TEs and a genome-wide trend toward silencing. For a TE neighboring a gene, the edges of chromatin modification must be restricted and/or restructured to tight boundaries that do not extend beyond the TE (Figure 3A). To counterbalance the spread of silencing, DNA methylation removal mechanisms patrol genes and promoters to remove DNA methylation. ROS1, DME, DML1, and DML2 comprise a family of DNA glycosylase proteins that prune DNA methylation preferentially from active gene regions (Penterman et al., 2007b; Zhu et al., 2007). Levels of these proteins are critical, as ros1 mutant plants have increased levels of genic CHG and CHH context DNA methylation that can cause gene silencing (Zhu et al., 2007). In the reference strain of Arabidopsis, ROS1 transcription levels are modulated via a methylation feedback mechanism acting through a Helitron TE located just upstream of the ROS1 promoter itself. This near-gene TE is targeted both by RdDM to add DNA methylation and the ROS1 protein to remove DNA methylation. The methylation at this promoter-proximal TE stimulates expression of ROS1, and the ROS1 enzyme then acts to demethylate this TE (as well as many others), thereby affecting its own expression and acting as an “epigenetic rheostat” or “methylstat” controlling the rate of demethylation from genic targets (Figure 3B) (Lei et al., 2015; Williams et al., 2015). This mechanism thus explains why ROS1 protein accumulation is reduced in several RdDM mutants (Penterman et al., 2007a). Although the exact mechanism of the ROS1 methylstat may not be conserved, other mechanisms likely contribute to generating a feedback loop between spread and removal of DNA methylation and other heterochromatic marks, as this feedback will be required to establish a genome-specific dynamic equilibrium between TE RdDM and activity of nearby genes.
In addition to DNA methylation containment mechanisms, multiple chromatin-modifying enzymes function together to segregate genic histone modification patterns from the TE-silencing mark of H3K9me2. The histone demethylase enzyme IBM1 functions to restrict H3K9me2 to TE regions by removing H3K9me2 from genes (Saze et al., 2008). In addition, the histone H3K4 demethylase proteins JMJ14, LDL1, and LDL2 function to remove the spread of the active genic chromatin mark of H3K4 methylation from entering into silenced TE regions (Greenberg et al., 2013). Thus, IBM1, JMJ14, LDL1, and LDL2 work to reinforce the heterochromatin-to-euchromatin boundary required to silence TEs near genes (Figure 3A).
There are multiple mechanisms by which a neighboring TE can influence a gene (Wang et al., 2013; Makarevitch et al., 2015; reviewed in Slotkin et al., 2012), often generating outstanding phenotypic examples of the meta-stable nature of the silencing boundary between TE and gene (Martienssen and Baron, 1994; Iida et al., 2004). Most commonly, either RNA expression or chromatin modification such as histone and/or DNA methylation spreads from the TE to a gene-regulating region. The influence of TE regulation on a gene may be useful for the organism and selected for, producing a stable epiallele, whereby a gene is controlled by the epigenetic regulation normally reserved for TEs (Fujimoto et al., 2008). The TE influence on gene regulation is magnified in mutants that lose control of TE silencing (Saze and Kakutani, 2007) or in cells that undergo developmental relaxation of TE silencing (DRTS) (Martínez and Slotkin, 2012). In these cases, the chromatin boundary between the silenced TE and the neighboring active gene is defective, and chromatin modification or transcription of the TE forges into the neighboring gene. One of the most striking examples of coordinated epialleles is illustrated by maternal imprinting in seed development. Many genes that are maternally imprinted (in the developing seed only the maternal copy of the gene is expressed) are flanked by TEs (Gehring et al., 2009). Through somatic plant development the TEs confer a silenced state upon the neighboring imprinted genes, and only upon the DRTS event in the pollen vegetative cell or embryo sac central cell is the silencing removed, activating the expression of these genes. Thus, in the properly developing endosperm, only the maternal copy of a TE-regulated gene will be expressed. In addition, genes involved in pathogen and pest resistance are also often tightly associated with neighboring TEs, suggesting that chromatin-level regulation and epialleles may play a role in general plant defense (Stokes et al., 2002; Yi and Richards, 2007).
TEs WITHIN GENES
Some TE types specifically target genes for insertion, a feature that has been leveraged for the use of these TEs as tools for mutagenesis and recovery of the tagged locus. While the determining factor for choice of TE insertion site is not discussed in this Review, there are several possible outcomes for a TE newly transposed into a gene. These potential outcomes depend on variables such as the activity level and physiological importance of the gene and where within the gene the TE inserted. If a TE inserts into an intron, it may not interrupt the gene’s activity and this TE may not be selected against. In this case, a TE insertion may be represented in a population’s gene pool, while the lack of selection may potentially lead to (1) TE sequence decay over time, (2) TE excision, and/or (3) TE deletion through illegitimate recombination or possibly through an as yet undefined activity of RdDM (Le et al., 2015). The TEs that we see within genes today are likely very new or neutral insertions, residing within genes without affecting that gene’s regulation. If a TE inserts into an exon or otherwise disrupts the gene’s activity, this allele may be selected against, and this may result in the allele’s disappearance from the population. Alternatively, a TE may escape silencing altogether by inserting into the 3′ untranslated region of an essential gene (Kabelitz et al., 2014), which could provide a safe-haven location that the cell cannot afford to silence. Ultimately, the majority of intragenic TEs do not remain complete or active, as the selection for proper gene activity leads to TE silencing (discussed further below) and/or TE sequence deletion.
Because of the mutagenic nature of TEs, it is in an organism’s best interest to silence a TE regardless of where in the genome it inserts. Even TEs located within genes are silenced with respect to transcriptional initiation (although they can be transcribed through from the external genic promoter). TE sequences within genes are still associated with dense DNA methylation and H3K9me2 (reviewed in Kim and Zilberman, 2014). The initial methylation of an intragenic TE likely occurs through RdDM. Exactly how RdDM is initially targeted to an unmethylated active TE is still unclear (reviewed in Fultz et al., 2015); however, most intragenic TEs are subject to a feed-forward loop combining RdDM, maintenance methylation, and H3K9me2 deposition (Figure 2). If unchecked, this reinforcement of chromatin modification can spread into the surrounding gene leading to gene silencing or epiallele formation (Saze and Kakutani, 2007), similar to the TEs near genes discussed in the previous section. However, as the silencing of TEs should not interfere with the proper expression of genes, silencing a TE within an active gene poses a complex dilemma requiring dense chromatin modification to keep the TE silenced, yet prevention of the spread of these modifications to retain the gene’s activity (Figure 3).
In Arabidopsis, 3% of TEs are located within genes (Le et al., 2015). The literature is often incorrect with assertions that genic TE insertions are epigenetically active, when the TE itself is silenced but subject to read-though transcription by a larger genic mRNA. In these cases, the TE promoters are silenced while the surrounding gene is transcriptionally active. The dense methylation and silencing of an intragenic TE can persist even when the TE is subject to the gene’s read-through transcription. Plants have thus evolved mechanisms to tolerate silenced TEs within genes. However, there may be consequences from transcribing through a silenced TE; therefore, the level of silencing for an intragenic TE must be tightly controlled. In wild-type Arabidopsis, mechanisms exist to ensure the proper read-through of heterochromatic TE insertions. When mutated for this mechanism, transcription through a silenced TE can lead to reduced expression of the gene’s downstream exons, early transcript termination, or premature polyadenylation (Saze et al., 2013). Although not a TE, there is a block of heterochromatin (H3K9me2 and DNA methylation) within an intron of the histone demethylase-encoding IBM1 gene itself that must be carefully regulated for proper gene expression (Rigal et al., 2012). One protein required for proper IBM1 expression is IBM2, a DNA binding protein that preferentially binds intronic heterochromatin associated with intragenic TEs, including the heterochromatic block within IBM1 (Saze et al., 2013). IBM2 also contains an RNA recognition motif that is thought to directly stabilize nascent RNA transcribed through intragenic heterochromatin. Although the mechanism is not understood, regulation of IBM1 may be analogous to the methylstat of ROS1: The pruning of H3K9me2 from genes via IBM1 is controlled by the proper read-through of an internal heterochromatic block within the IBM1 gene itself, providing dynamic feedback for how much genome-wide H3K9me2 to remove. This again highlights the requirement for feedback control to balance the equilibrium between TE silencing and gene activity.
TEs AT THE PERICENTROMERE, KNOBS, AND TE ISLANDS
Plant centromeres are megabase-scale structures consisting of a functional centromere core, which serves as the site of spindle fiber attachment during cell division (see below) and flanking pericentromeric domains. The pericentromere is a large TE island that physically separates the centromere core from the gene-rich chromosome arms (Arabidopsis Genome Initiative, 2000; Simon et al., 2015). In addition to the pericentromere, other large TE islands exist on chromosomes including knobs, which are simply TE islands on the chromosome arms large enough to be cytologically visualized. In Arabidopsis, the chromosome 4 knob was generated by a large inversion that translocated pericentromeric TEs out to the gene-rich chromosome arm (Fransz et al., 2000). This knob/large TE island has served as a model for investigating the fate of genes translocated into a centromere and centromeric sequences translocated into gene-rich regions (Arabidopsis Sequencing Consortium, 2000). The composition of pericentromeres, knobs, and other TE islands is similar, and in this section, these TE contexts are considered together due to their comparable organization and regulation.
Pericentromeres, knobs, and TE islands are characterized by high TE content (Arabidopsis Genome Initiative, 2000). In plants, LTR retrotransposons and some DNA transposons such as CACTA family elements dominate this composition (Arabidopsis Genome Initiative, 2000; Han et al., 2013), but it is unclear whether these TEs specifically target these regions or if they are simply not selected against and accumulate at these locations. The size of TE islands is directly and linearly correlated with the overall genome size: Small genome plants such as Arabidopsis have relatively few TE islands, while large genomes such as maize (Zea mays) and wheat (Triticum aestivum) have greater numbers and expanded sizes of dense TE islands. Typically, in genomes with large and abundant TE islands, a handful of LTR retrotransposon families have amplified to extremely high copy numbers and are responsible for genome size expansion (Daron et al., 2014). The dates of these distinct amplifications vary widely between individual TEs amplified within the same genome, indicating that obese genome size is not a product of a single stress or evolutionary event, but rather reflects preexisting TEs separately overcoming their silencing inhibition and bursting in activity for a short time before being resilenced (Figure 4). Although short-lived, these occasional ancient bursts in TE activity are evident today due to the long periods of evolutionary time required to remove TE content via illegitimate recombination. After the burst of TE activity and a long slow period of reduction in TE content, TE islands persist composed of complex arrays of nested TE fragments where only the young LTR retrotransposons most recently active remain uninterrupted by other transposition events.
Punctuated Bursts of Different TE Lineages from the Same Plant Genome.
Phylogenetic reconstructions of TE activity demonstrate that amplification of different TE lineages (green, orange, dark blue, light blue, and purple) within the same genome did not occur at the same time. Branch points represent TE transposition events, and branch length represents TE divergence. These data and model, adapted from Daron et al. (2014), refutes the “genome shock” hypothesis where the silencing pressure inhibiting all TEs is simultaneously released, and many or all TEs undergo a coordinated burst of activity. Rather, these data suggest that TE activity and genome expansion in plants are due to the ability of preexisting individual elements to separately circumvent silencing and go through a boom period of activity before resilencing. MYA, million years ago.
Pericentromeres and knobs are the classical components of constitutive heterochromatin, the consistently dark-staining regions of chromatin condensation visualized via microscopy. In addition to histone and DNA methylation, nucleosome spacing is compacted at these TEs by the swi/snf chromatin-remodeling protein DDM1, which acts through the linker histone H1 to condense chromatin (Zemach et al., 2013). DDM1 is a master regulator of TE activity, as its activity is specific to TEs, while other proteins such as MET1 act on any methylated region, including some gene bodies. Thus, the TEs at the pericentromeres, knobs, and TE islands under control of DDM1 are most often transcriptionally silenced (do not produce an mRNA). When DDM1 function is lost, TE maintenance DNA methylation, histone modification, and nucleosome compaction are all defective, resulting in the mRNA transcription, protein production, and transposition of TEs.
TEs at the pericentromere, knob, and TE islands are epigenetically maintained in a silenced state and do not require the activity of RdDM to sustain their silencing of mRNA transcription. Several articles have recently described two mutually exclusive classes of TEs in Arabidopsis: one where CHH methylation is maintained by CMT2 and the other where CHH methylation is constantly targeted via RdDM (Figure 5) (Stroud et al., 2014; Li et al., 2015; McCue et al., 2015). These two distinct classes represent (1) TEs present in large constitutive heterochromatic blocks found in the pericentromere, knobs, and TE islands regulated via CMT2 and (2) TEs near genes regulated by RdDM and the DNA methyltransferase DRM2 (see above). This division of TEs exists even in maize, a species that does not have a CMT2 homolog and where the vast number of TEs present in large TE islands are not targeted via RdDM and thus do not have any CHH methylation (Gent et al., 2014). How the H3K9me2 mark is differentially interpreted and leads to either CMT2-based CHH methylation or DRM2/RdDM-based CHH methylation is currently unknown and represents a key branch point in TE silencing (Figure 5). This distinction is not primarily due to TE type or location, as a silenced TE regulated by CMT2 can switch to RdDM regulation upon transcriptional activation (McCue et al., 2015). In addition, although they are not targeted via RdDM, pericentromeres, knobs, and TE islands are thought to be transcribed by RNA Pol IV because they still produce 24-nucleotide siRNAs (Law et al., 2013; Li et al., 2015). These small RNAs likely act in trans to silence any active matching TEs elsewhere in the genome, producing a homology sensor valuable to the plant for silencing similar potentially active TEs introduced upon crossing. One remaining question is why the deeply silenced TEs at the pericentromere, knob, or TE island that still produce and match 24-nucleotide siRNAs are not targeted for RdDM themselves (Stroud et al., 2014; Li et al., 2015; McCue et al., 2015). Even though 24-nucleotide siRNAs perfectly match these sequences, other factors required for RdDM must be missing to keep these TEs free from the constant methylation reinforcement of RdDM. This missing RdDM factor is not Pol V occupancy, as recent data demonstrated Pol V presence at a TE regulated by CMT2 and not RdDM (McCue et al., 2015).
Two Mutually Exclusive Mechanisms of CHH Methylation.
TE CHH methylation either occurs via RdDM (left) or CMT2-mediated maintenance methylation (right) (Stroud et al., 2014; Li et al., 2015; McCue et al., 2015). RdDM acts at small TEs near genes on the euchromatic chromosome arms (green arrow), while CMT2 acts at TEs in pericentromeres, knobs, and TE islands (red arrows). Both of these pathways rely on existing H3K9me2, but how this one mark is differentially interpreted for either RdDM or CMT2-based maintenance methylation (but not both at the same time) is currently a critical question in the field. Protein sizes not to scale.
The constitutive heterochromatic silencing of the pericentromere, knob, and TE islands is thought of as the end point in a series of mechanisms to initiate, establish, and maintain TE silencing (Fultz et al., 2015), and this end-point silencing is stable across many plant generations (Becker et al., 2011). The silenced state of constitutive heterochromatin at TE islands is perpetuated through cell division and, in plants, through generations, thus defining these TEs as being epigenetically silenced on a trans-generational basis. Propagation of DNA methylation patterns, histone modification patterns, and chromatin compaction all contribute to the transcriptionally silenced state, while their individual roles in heritable epigenetic trans-generational propagation are controversial. In plants, there is no evidence of CG methylation erasure in the sperm, egg, or embryonic cells; therefore, it is clear that in plants methylation initiated in one generation can be subsequently inherited by the progeny and account for the trans-generational epigenetic silencing of plant TEs over many generations (Mathieu et al., 2007; Becker et al., 2011). It is controversial whether histone modification patterns associated with silenced TEs, such as H3K9me2, H3K27 mono- and dimethylation, and H4K20me1, are inherited across generations or if they are retargeted early in embryogenesis. Even more controversial is the idea that small RNAs from the parental generation contribute to TE silencing in the early embryo of the next generation, as they do in Drosophila melanogaster (Brennecke et al., 2008). Nevertheless, the robust continued trans-generational silencing of plant TEs and lack of positive selection for transposition lead to TE sequence degradation due to random mutations and illegitimate recombination, resulting in the old and cryptic TE fragments found throughout plant genomes.
TEs AT THE CENTROMERE CORE
The functional kinetochore of any plant chromosome is defined as the location bound by the centromere-specific histone H3 variant CENH3. Deposition of CENH3 results in spindle fiber attachment and chromosome segregation during cell division. CENH3 replaces some (but not all) canonical histone H3 at the centromere core (Bodor et al., 2014), resulting in a mixture of CENH3 and H3 (Gent et al., 2012); however, there is a threshold level of CENH3 that must be deposited for the mass action of centromere function. In a typical plant centromere, CENH3 is enriched at short tandem repeats, whose typical 153- to 180-bp size has coevolved with the CENH3 histone. Although this tandem repeat size suggests a single nucleosome-per-tandem repeat unit, data suggest that the average spacing of CENH3 is more dispersed and some species have tandem repeat units from 400 to 1000 bp (Gent et al., 2011). In addition, the genetic/structural composition of the centromere core is highly dynamic and underlies the epigenetic shifts of CENH3 deposition (Zhang et al., 2013; Gao et al., 2015). It has been speculated that this rapid centromeric repeat evolution and competition for CENH3 binding creates meiotic drive and plays a key role in plant speciation (Malik and Henikoff, 2001; Hirsch et al., 2009).
Multiple types of TEs are found within and neighboring the core centromeric tandem repeat islands. On average, these TEs are longer and more often represent intact full-length elements compared with TEs located near genes, suggesting centromere core TEs are have been active more recently (Arabidopsis Genome Initiative, 2000). Centromere-specific retrotransposons (CR elements) preferentially insert into CENH3 domains and are found interspersed within the centromeric tandem repeats (Zhong et al., 2002). CR elements are targeted to the centromere core via their integrase proteins, and variations in the putative integrase targeting domain results in more or less restriction to the centromere core (Neumann et al., 2011). The maize CR TE (CRM) interacts with CENH3 and is also found at chromosomal knobs, which can display neocentromere activity. Alternatively there are some functional neocentromeres that lack TEs and satellite repeats, suggesting that the presence of TEs and tandem repeats functions in the epigenetic maintenance of centromere identity rather than in the establishment of a spindle fiber attachment site (Gong et al., 2009; Topp et al., 2009).
Similar to animals, plant CENH3 chromatin (including CR elements) contains a mixture of active and inactive epigenetic marks: dense DNA methylation (Gent et al., 2012; Regulski et al., 2013) yet with a reduced level of H3K9me2 (Zhang et al., 2008). This mixed epigenetic pattern correlates with data demonstrating that centromere core TEs and satellite repeats are transcribed (May et al., 2005; Neumann et al., 2011; Gent and Dawe, 2012). Centromere core TEs dispersed within the tandem repeats are thought to drive the expression of their neighboring repeats (May et al., 2005). CRM expression is thought to be essential for the recruitment of CENH3, and at least one kinetochore protein has been found to bind RNA in vitro (Du et al., 2010). CR expression raises the question of why or how the transcriptional silencing, RdDM, and RNAi-based mechanisms of repression that target distal TEs for silencing regulate centromere core TEs. Influenced by data on centromere specification in fission yeast, three nonmutually exclusive models exist for RNA function at the plant centromere core: (1) Pol II transcription may specify CENH3 deposition by stalling of the polymerase (Catania et al., 2015), (2) CR/tandem repeat transcripts may be degraded into small RNAs that target centromere epigenetic states (Volpe et al., 2003), and (3) CR transcripts may act as a scaffold and bind centromeric proteins, resulting in the recruitment of CENH3 (Böhmdorfer and Wierzbicki, 2015). Research in Arabidopsis demonstrates that CR and tandem repeat transcripts are processed into siRNAs (May et al., 2005); however, the functional significance of these siRNAs in regulating centromeric activity is not understood, especially due to the fact that there is no evidence of chromosomal dysfunction in small RNA biogenesis mutants. In addition, there is currently no data demonstrating a functional role of TE transcription in driving CENH3 deposition or centromere function. TE regulation at the centromere core has not been well explored, and these experiments often do not differentiate the activity of centromere core-located TEs from similar TE copies located elsewhere. However, the connection between the TE contribution to the centromere core structure and the potential for TE epigenetic regulation to play an important role in specifying centromere function remains intriguing.
CONCLUSION
This Review dissects TE chromosomal context into four broad categories. However, additional categories are likely to be discovered, and this may shed light on how the cell dictates which TEs are subject to each silencing mechanism. For example, recent investigation of the three-dimensional chromosomal structure identified TE island (nonpericentromeric) regions of all five Arabidopsis chromosomes that consistently interact (Grob et al., 2014). The function and mechanism of this trans-chromosome TE interaction, termed the “KNOT,” is currently not understood. KNOT TE islands are enriched in Vandal family Mutator DNA transposons and Atlantys family gypsy LTR retrotransposons. The regulation of these KNOT TEs may be distinct from other TE islands, and these regions may be hot spots for TE insertion. Thus, TE function is not understood in all chromosomal contexts, particularly at the centromere core, and many discoveries on the function of TEs and their epigenetic regulation remain.
Acknowledgments
We thank Jay Hollick, Jonathan Gent, Kaushik Panda, Dalen Fultz, and Josquin Daron for their critical comments on the article. This work is supported by National Science Foundation Grants MCB-1252370 and IOS-1340050.
AUTHOR CONTRIBUTIONS
All authors contributed to writing the article.
- Received October 9, 2015.
- Revised December 18, 2015.
- Accepted February 10, 2016.
- Published February 11, 2016.