- © 2019 American Society of Plant Biologists. All rights reserved.
Abstract
Recombination plays an integral role in the creation of novel genetic variation in sexually reproducing species. Despite this important role, the determinants and evolution of crossover hotspots have remained poorly understood in plants. Here, we present a comparative analysis of two rice (Oryza sativa) historical recombination maps from two subspecies (indica and japonica) using 150 resequenced genomes. Fine-scale recombination rates and crossover hotspots were validated by comparison with a consensus genetic map and empirically derived crossovers, respectively. Strikingly, nearly 80% of crossover hotspots were unique to each subspecies, despite their relatively recent divergence and broad-scale correlated recombination rates. Crossover hotspots were enriched with Stowaway and P instability factor (PIF)/Harbinger transposons and overlapped accessible chromatin regions. Increased nucleotide diversity and signatures of population differentiation augmented by Stowaway and PIF/Harbinger transposons were prevalent at subspecies-specific crossover hotspots. Motifs derived from lineage-specific indica and japonica crossover hotspots were nearly identical in the two subspecies, implicating a core set of crossover motifs in rice. Finally, Stowaway and PIF/Harbinger transposons were associated with stabilized G/C bias within highly active hotspots, suggesting that hotspot activity can be fueled by de novo variation. These results provide evolutionary insight into historical crossover hotspots as potentially powerful drivers of sequence and subspecies evolution in plants.
INTRODUCTION
Meiotic crossovers provide the foundation for genetic diversity through the creation of chimeric haplotypes during gametogenesis. Generally, a minimum of one crossover per chromosome, aptly termed the obligate crossover, is necessary for proper chromosomal segregation, as crossovers establish the positions of chiasmata, the physical contacts between paired nonsister chromatids during the pachytene phase of prophase I (Petronczki et al., 2003). In most eukaryotes, crossovers are nonuniformly distributed across chromosomes and typically cluster within crossover hotspots, 1- to 5-kb genomic regions where crossover rates are several times greater than in surrounding regions (reviewed in Paigen and Petkov, 2010; Choi and Henderson, 2015). However, crossover hotspots are hardly a universal feature of eukaryotic genomes, considering that species such as Drosophila melanogaster have been shown to lack crossover hotspots altogether (Smukowski Heil et al., 2015). The occurrence of crossover hotspots has been demonstrated in several organisms (Myers et al., 2005, 2010; Baudat et al., 2010; Auton et al., 2013; Singhal et al., 2015) including plants (Paape et al., 2012; Choi et al., 2013), but the factors contributing to hotspot evolution and how hotspot divergence influences population dynamics have remained elusive in most species.
In humans, the genomic positions of crossover hotspots are established by PR domain zinc finger protein9 (PRDM9), which possesses sequence-specific DNA binding capabilities through a C2H2 zinc finger array, H3 histone methyltransferase activity via a SET domain, and a KRAB domain for protein-protein interactions (Myers et al., 2010; Imai et al., 2017; Diagouraga et al., 2018). PRDM9 binds a degenerate 13-mer motif with sequence similarity to THE1 and L2 retrotransposons (Myers et al., 2008) and catalyzes the trimethylation of Lys-4 on the N-terminal tail of proximal H3 histones (H3K4me3; Hayashi et al., 2005). The covalent modification of H3K4me3 by PRDM9 initiates nucleosome reorganization at H3K4me3-tagged chromatin prior to double-strand breaks (DSBs) facilitated by sporulation-11 (SPO11; Baker et al., 2014) . Ectopic crossover hotspots in mouse PRDM9 knockouts occur in gene promoters and regions with elevated levels of H3K4me3, suggesting a general role of PRDM9 in keeping crossovers away from genes (Brick et al., 2012). However, orthologs for PRDM9 are notably absent in several organisms, including plants, yeast (Saccharomyces cerevisiae), and D. melanogaster, which implicates alternative mechanisms for the determination of crossover hotspot locations (Choi and Henderson, 2015). For species lacking PRDM9, meiotic DSBs and crossover hotspots display notable associations with various chromatin and genomic features reminiscent of mouse PRDM9 knockouts. For example, in yeast, meiotic DSBs preferentially occur within gene promoters, coincident with regions of low nucleosome density, DNA methylation (mDNA), and H3K4me3-modified nucleosomes (Borde et al., 2009; Pan et al., 2011; Yamada et al., 2013). In addition, yeast crossover hotspots have been shown to colocalize with other transcription-activating chromatin marks, such as H3K9ac (Yamada et al., 2013). Similar chromatin and genome organization patterns were identified in Arabidopsis (Arabidopsis thaliana), where crossover hotspots were shown to be functionally related to the unstable histone variant H2A.Z at gene transcription start sites (TSSs) and transcription termination sites (TTSs; Choi et al., 2013). Several studies in Arabidopsis have described DNA sequences enriched within crossovers and crossover hotspots, such as the CCN-repeat, CTT-repeat, and A/T-rich motifs, which are proposed to play a role in reducing nucleosome occupancy (Choi et al., 2013; Wijnker et al., 2013; Shilo et al., 2015). More recently, SPO11-mediated DSB hotspots in plants have been shown to overlap transposons, consistent with the observation of empirically determined crossovers coinciding with short DNA miniature inverted repeat transposable elements (MITEs; He et al., 2017; Marand et al., 2017; Choi et al., 2018). However, a putative function and evolutionary role of these repetitive elements in recombination have yet to be elucidated.
Studies in humans and yeast have shown that crossover hotspots undergo a high turnover rate owing to the crossover hotspot paradox, which states that hotspot alleles are purified due to G/C-biased gene conversion; recombinogenic alleles are replaced by DNA repair with the more stable haplotype (high G/C content), resulting in subsequent loss of the hotspot allele (Myers et al., 2010). High mutation rates in the zinc finger domains of PRDM9 have helped to explain the coevolution between PRDM9 DNA binding sites and their disappearing sequence motifs (Thomas et al., 2009). The high mutation rate of PRDM9 zinc fingers intuitively suggests variation in hotspot sites due to divergence in PRDM9 sequence specificity among related lineages. In line with this hypothesis, comparisons between closely related animal species have revealed that most crossover hotspots are typically not shared (Tsai et al., 2010; Auton et al., 2012; Singhal et al., 2015). This phenomenon has been best studied through comparisons of human and chimpanzee hotspots, revealing a remarkable lack of concordance. However, since plants lack a PRDM9 ortholog, the molecular framework governing crossover hotspot evolution has received little attention. Here, we constructed two fine-scale recombination rate maps using independent populations of rice (Oryza sativa) subspecies indica and japonica (hereafter referred to as indica and japonica, respectively) via whole-genome resequencing data afforded by the 3000 Rice Genomes Project (Wang et al., 2018). We demonstrate an association between crossover hotspots and various genomic and chromatin features consistent with previous findings in other model organisms. Our analysis revealed that crossover hotspots in rice are associated with regions flanking structural variation (SV), elevated G/C allele frequencies, two classes of DNA transposons, accessible chromatin, and signatures of population differentiation and nucleotide diversity. The widespread occurrence of distinct crossover hotspots and their association with lineage-specific variation implicates crossover hotspot evolution as a major contributor that has fueled the divergence of closely related species.
RESULTS
Construction of Coalescent-Based Recombination Rate Maps in Rice
Although indica and japonica share an estimated 99% of genomic sequence, comparative assessments place their most recent common ancestor around 0.40 to 0.44 million years ago (Ma and Bennetzen, 2004; Zhu and Ge, 2005; Du et al., 2017). Analysis of haplotype variation among indica and japonica has revealed a nearly sevenfold difference in linkage disequilibrium (LD) between subspecies, suggesting considerable variability in the distributions of recombination (Mather et al., 2007). Taken together, the recent divergence, high sequence similarity, and stark differences in LD among indica and japonica rice subspecies provide an opportune framework in which to investigate the evolutionary implications of patterns of recombination. To assess potential diversity in the recombination landscape of these subspecies, we constructed population-scaled recombination rate maps using single-nucleotide polymorphisms (SNPs) independently derived from 75 indica and 75 temperate japonica accessions (Supplemental Data Set) from the 3000 Rice Genomes Project (Supplemental Figures 1–3; Li et al., 2014). Historic fine-scale recombination rates can be estimated from population SNP data sets by simulating coalescent events between variants (McVean et al., 2004). Genome-wide recombination rates were estimated using a total of 1,616,002 and 753,186 SNPs (see Methods) from indica and japonica populations, respectively, using the Nipponbare genome assembly (IRGSP v7.0) as a reference (Kawahara et al., 2013). To validate the population-based recombination rate estimates (ρ), each map was compared with a consensus genetic map generated using public data from two recombinant inbred line (RIL) populations genotyped by resequencing (Huang et al., 2009; Yu et al., 2011). The RIL population described by Huang et al. (2009) was derived from crosses between 93-11 (indica) and Nipponbare (japonica), whereas RILs developed by Yu et al. (2011) were the result of crosses between Zhenshan 97 and Minghui 63, two elite indica varieties. The consensus genetic map harbored a total of 3860 crossovers at a resolution of ∼25 kb. Recombination rate estimates for both subspecies were significantly correlated with experimentally determined crossover frequencies at the 200-kb level (Spearman’s rank correlation; ρ = 0.3, P < 2.2e-16; Figure 1A; Supplemental Table; Supplemental Figure 4). Furthermore, comparison of recombination rates between subspecies revealed a strong correlation at the same 200-kb scale (Spearman’s rank correlation; ρ = 0.7, P < 2.2e-16). To rule out differing SNP data sets as a contributor to recombination rate variation, we recalled SNPs in each subspecies population and only retained variants segregating in both populations (minimum allele frequency > 0.1). This shared set of variants (n = 569,693) was then used to reestimated recombination rates. Comparison of the shared SNP recombination maps (200-kb scale) revealed greater correlation between subspecies (Spearman’s rank correlation; ρ = 0.8, P < 2.2e-16) and among the historical and RIL-based maps (Spearman’s rank correlation; ρ = 0.4, P < 2.2e-16; Supplemental Figure 5). In addition, we found that recombination rates from the full and shared SNP data sets within subspecies were highly correlated (Spearman’s rank correlation; ρ [0.65 – 0.91]; Supplemental Figure 6).
Fine-Scale Recombination Rate Estimates and Crossover Hotspot Characteristics.
(A) An example on chromosome 5 highlighting the correlations among broad- and fine-scale recombination rates for two rice subspecies and a consensus genetic map. Blue lines indicate japonica recombination rates on a fine scale (top blue plot) and a broad scale (bottom blue plot) (200 kb). Orange lines represent indica recombination rates at a fine scale (top orange plot) and broad scale (bottom orange plot). RIL consensus, merged RIL recombination rate maps at the broad scale (composed of two independent RIL populations). Merged, all broad 200-kb recombination rates overlaid from japonica, indica, and the RIL consensus. The bottom heat map (low to high; white to purple) reflects the density of long terminal repeats (LTRs) per 100 kb.
(B) Cumulative proportion of recombination rate (y axis) as a function of genomic space (x axis) for indica (orange line) and japonica (blue line).
(C) Comparison of fine-scale recombination rates for representative conserved (top) and unique (bottom) crossover hotspots. Crossover hotspots for japonica (blue) and indica (orange) are represented by blocks underneath recombination rate peaks.
(D) Venn diagram comparing crossover hotspot counts between the two subspecies.
(E) Empirical distributions (gray, n = 10,000) of overlapping rates between the noted empirical crossover data set and random regions. Purple lines indicate the observed overlapping rate between the denoted crossover data sets.
Identification and Validation of Crossover Hotspots
Past studies have provided evidence that recombination rate variation can occur at resolutions less than 1 kb (McVean et al., 2004; Choi et al., 2013). Thus, we estimated the relative proportions of the rice genome associated with recombination in both subspecies. Remarkably, greater than 80% of recombination occurred in less than 4.2% and 5.3% of the genomic space for japonica and indica, respectively, implicating the presence of crossover hotspots in rice (Figure 1B). To comprehensively identify the genome-wide distribution of crossover hotspots, we scanned the collective set of haplotypes for each subspecies independently using sequenceLDhot, resulting in a total of 14,125, and 13,082 crossover hotspots (ranging from 1 to 5 kb) for indica and japonica, respectively. Although the indica recombination map was composed of more than twice as many variants, indica crossover hotspots were significantly longer (2064 nucleotides) compared with japonica (1679 nucleotides) on average (Wilcoxon rank sum test; P < 4.8e-290). Visual inspection uncovered conserved and lineage-specific crossover hotspots between indica and japonica on a fine scale (Figure 1C). Comprehensive analysis of crossover hotspots from each subspecies revealed that only 23% and 25% of indica and japonica crossover hotspots overlapped (defined by an intersection of at least 1 bp), respectively (Figure 1D). To rule out the differing SNP densities as an underlying contributor toward dissimilar hotspot positions, we reidentified hotspots using the shared SNP data sets for each subspecies. Remarkably, hotspots from the recombination maps using shared SNP positions also failed to overlap and exhibited similar overlapping rates to the full SNP maps (18% and 23% overlap of indica [n = 11,163] and japonica [n = 9183] crossover hotspots). Comparison of hotspots from the shared SNP map between subspecies was consistent with indica containing overall longer crossover hotspots (Wilcoxon rank sum test; P < 4.0e-7).
To determine if historical crossover hotspots were supported by empirically observed crossovers, we compared hotspot coordinates with crossovers from the two published rice RIL data sets (from the consensus genetic map) as well as crossover hotspots in an F2 population derived from a cross between 93-11 and PA64S (Huang et al., 2009; Yu et al., 2011; Si et al., 2015). Approximately 93% (25/27; Si et al., 2015), 92% (2068/2252; Huang et al., 2009), and 95% (1520/1608; Yu et al., 2011) of empirical crossovers overlapped indica crossover hotspots, while 63% (17/27; Si et al., 2015), 47% (1058/2252; Huang et al., 2009), and 49% (782/1608; Yu et al., 2011) of crossovers were recovered by japonica crossover hotspots. The reduced coincidence of japonica hotspots with observed crossover events could be due to the overall decreased recombination rates or insufficient power to detect hotspots in japonica. It is also possible that crossovers in the RIL populations are established by dominant meiotic factors associated with indica rather than japonica backgrounds. To estimate if the observed overlapping rates were greater than chance, we built empirical null distributions using Monte Carlo (MC) simulations from random regions for the data sets with sufficient sample size (Huang et al., 2009; Yu et al., 2011). In all cases, the overlap between crossover hotspots and empirical crossovers was significantly greater than chance (empirical; all, P < 1e-4; Figure 1E). Taken together, fine-scale recombination rates and crossover hotspots are robust to varying SNP data sets and supported by colocalization with empirically observed crossovers.
The Genomic and Epigenomic Features of Rice Crossover Hotspots
Crossover hotspots are nonuniformly distributed across the genomic landscape, showing preference for upstream gene TSSs and within nucleosome-depleted regions (Choi et al., 2013; Marand et al., 2017). To gain an understanding of the fine-scale contributors to hotspot evolution, we characterized the distribution of crossover hotspots over various genomic features. Consistent with previous findings, indica and japonica crossover hotspots were enriched in promoters ∼100-nucleotide upstream TSSs (promoter defined as 2-kb upstream TSSs), downstream gene TTSs (1 kb downstream), and depleted within gene bodies (Figure 2A). However, subtle differences were observed in crossover hotspot distributions between subspecies, including overrepresentation of japonica crossover hotspots within introns and exons and of indica crossover hotspots within promoters, 5′ untranslated regions, and DNA transposons (two-proportion z test; all comparisons, P < 0.02). Nevertheless, the overall genomic distributions of crossover hotspots were largely similar between subspecies (Figure 2B).
Analysis of Genomic and Epigenomic Features Underlying Crossover Hotspots.
(A) Aggregate indica (top) and japonica (bottom) crossover hotspot counts across all protein-coding genes. Gray lines represent hotspot enrichment at an equal number of random regions (n = 35,651).
(B) Comparison of indica (orange) and japonica (blue) crossover hotspots with genomic features and random (gray) nearby regions. Random regions were generated for each subspecies data set and restricted to intervals within 10 to 1000 kb of a crossover hotspot with no statistical evidence of containing a crossover hotspot (LRT = 0 from sequenceLDhot). Proportions per genomic feature for indica and japonica controls were averaged owing to nearly identical distributions. Promoters were taken as 2-kb windows upstream TSSs. RC, rolling circle transposons; UTR, untranslated region.
(C) Chromatin accessibility, H3K4me3, transposable element content, and genes at representative crossover hotspots specific to indica and japonica lineages.
(D) Normalized read counts for various chromatin data sets, centered on crossover hotspots in 10-kb windows and aggregated across all crossover hotspots. All data sets were normalized (RPM) and scaled from 0 to 1.
(E) MC distributions (10,000×) of nearby random regions (10–1000 kb) are shown as violin plots, with the mean scaled RPM chromatin mark density of all crossover hotspots (orange bars) for indica (top panel) and japonica (bottom panel).
(F) Heat map depicting 10-kb windows from (D) for individual hotspot and control regions. K-means clustered groups are indicated to the right of each heat map.
Chromatin structure has also been proposed to play an integral role in establishing crossover sites in plants (Choi et al., 2013; Shilo et al., 2015; Marand et al., 2017). To assess the importance of chromatin toward historical crossover hotspot formation in rice, we generated DNase sequencing (DNase-seq) libraries from the indica rice accession 93-11 to complement DNase-seq data previously developed in Nipponbare (Zhang et al., 2012). DNase hypersensitive sites (DHSs), which represent domains of highly accessible chromatin, were identified in both subspecies. We also analyzed epigenomic data sets where chromatin marks were available in both subspecies, leveraging the indica and japonica accessions 93-11 and Nipponbare as representative genotypes for each subspecies, respectively. Crossover hotspots in both subspecies were generally located within accessible chromatin regions, marked by DHSs, relative to flanking sites in both subspecies (Figures 2, C and 2D). We observed enrichment of H3K4me3, H3K9ac, H4K12ac, and H3K27me3 histone modifications and two DNA methylation contexts, mCHG and mCHH (where H represents A, T, or C) over crossover hotspots. Crossover hotspots were depleted of CpG mDNA and the transcription elongation histone modification, H3K36me3, which is known to be negatively correlated with the formation of meiotic DSBs in yeast (Hansen et al., 2011). We then performed MC simulations (10,000×) comparing normalized (reads per million mapped reads [RPM]) read counts at hotspots with nearby (10–1000 kb) random regions to determine if the observed chromatin modifications were statistically relevant. These simulations revealed that crossover hotspots were significantly associated with increased DNase-seq, H3K4me3, H3K9ac, H4K12ac, H3K27me3 RPM, and mDNA levels in mCHG and mCHH contexts for both subspecies (empirical; all, P < 1.0e-4; Figure 2E). The co-occurrence of accessible chromatin, DNA methylation, and histone modifications over crossover hotspots can be partially explained by the uncertainty of the precise DSB locations within the hotspot domain (1–5 kb in length) as well as chromatin state variation among crossover hotspots (Figure 2F). The enrichment of DNA methylation at crossover hotspots in rice is an unusual finding, considering that DNA hypomethylation has been observed at crossover hotspots in Arabidopsis (Choi et al., 2013; Yelina et al., 2015). However, gene promoters, the preferred genomic context for crossover hotspots, in rice are known to contain relatively greater levels of DNA methylation compared with Arabidopsis (Zemach et al., 2010a, 2010b; Choi et al., 2013; Yelina et al., 2015). In addition, whether rice SPO11 homologs can bind methylated DNA is unknown. Nevertheless, mDNA levels greater than the genome-wide average occurred in only a minority (11–24% across all mDNA contexts and subspecies) of hotspots and colocalized with increased chromatin accessibility (Figure 2F). It is important to note that chromatin data were derived from somatic tissue and may not necessarily reflect the chromatin state during meiosis. However, since male meiocytes are speculated to undergo widespread hypomethylation owing to upregulated transposable element (TE) transcription (Chen et al., 2010), we cannot preclude the possibility that crossover hotspots may become hypomethylated prior to DSB formation during prophase I.
Crossover Hotspots Are Enriched with Open Chromatin-Associated Stowaway and PIF/Harbinger MITE Transposons
Meiotic crossover hotspots in humans are established by a 13-mer DNA sequence motif targeted by the DNA binding domain of PRDM9, where the 13-mer sequence motif was shown to exhibit sequence similarity with THE1A/B and L2 retrotransposons (Myers et al., 2008, 2010). To elucidate a potential relationship among transposons and recombination, we analyzed fine-scale recombination rates overlying repetitive sequences for various transposable element families in rice. In general, retrotransposon families were coincident with lower recombination rates compared with DNA transposons and simple sequence repeats (mononucleotide to 12-mer repeats, 5–1040 nucleotides in length), which demonstrated overall greater levels of recombination (Supplemental Figure 7). We then assessed whether these relationships persisted for crossover hotspots by examining the enrichment of the same classes of transposons relative to nearby (10–1000 kb) regions with suppressed recombination (bottom quartile of genome-wide recombination rates) and no evidence of containing crossover hotspots (sequenceLDhot likelihood ratio test [LRT] = 0). Notably, P instability factor (PIF) /Harbinger, Stowaway, and simple sequence repeats were overrepresented within crossover hotspots for both subspecies (Figure 3A). However, comparison of transposon content between indica and japonica crossover hotspots revealed that Stowaway transposons and simple sequence repeats were significantly more prevalent in indica crossover hotspots (Fisher’s exact test; P < 0.01). To determine if Stowaway and Harbinger transposons statistically influence hotspot recombination rates, we compared peak recombination rates among hotspots either harboring or lacking at least one Stowaway or PIF/Harbinger element. The occurrence of Stowaway or PIF/Harbinger elements within crossover hotspots was accompanied by significantly greater peak recombination rates, suggesting that their presence may enhance hotspot recombination activity (Wilcoxon rank sum test; all comparisons, P < 6.7e-7; Supplemental Figure 7).
Characterization of Crossover Hotspot Transposons.
(A) Enrichment of various transposon families in indica and japonica crossover hotspots.
(B) Aggregate counts of Stowaway (top) and PIF/Harbinger (bottom) elements relative to all rice protein-coding genes (n = 35,651; TSS-TTS). Dark gray lines indicate the relative levels of Stowaway and PIF/Harbinger elements over random regions (n = 35,651).
(C) Normalized (reads per million) DNase-seq read counts for genes harboring a Stowaway (top panel) or PIF/Harbinger (bottom panel) 2-kb upstream TSS or 1-kb downstream TTS. Genes lacking Stowaway or PIF/Harbinger are shown as gray lines.
(D) MC simulated distributions (10,000×) of recombination rates over Stowaway (orange) and PIF/Harbinger (blue) compared with the same elements containing DHSs (red lines) for indica (left panel) and japonica (right panel).
(E) Comparison of DNase-seq read counts for crossover hotspots containing a Stowaway (orange) or PIF/Harbinger element (blue) and crossover hotspots lacking a Stowaway (purple) or PIF/Harbinger element (green) for both indica (left panel) and japonica (right panel).
(F) Normalized DNase-seq read counts for crossover hotspots harboring different counts of Stowaway and PIF/Harbinger elements. A linear model (red lines) is shown fitting DNase-seq read counts by the number of Stowaway and PIF/Harbinger elements per crossover hotspot.
Asterisks denote significance from MC simulations (D) and Wilcoxon rank sum tests (E) for α < 0.05.
To further characterize the relationships among PIF/Harbinger and Stowaway transposons and rice crossover hotspots, we investigated the genomic distribution of PIF/Harbinger and Stowaway transposons at a fine scale. Like crossover hotspots, PIF/Harbinger and Stowaway transposons demonstrated a pattern of enrichment upstream TSSs and downstream TTSs (Figure 3B). To explain these patterns, it has been posited that DNA transposons preferentially insert within accessible chromatin regions typically found upstream and downstream of genes (Liu et al., 2009; Gangadharan et al., 2010; Sigman and Slotkin, 2016). Consistent with this hypothesis, promoters and 1-kb downstream TTSs harboring Stowaway or PIF/Harbinger transposons were concomitant with increased chromatin accessibility (Figure 3C). We then split Stowaway and PIF/Harbinger elements into two groups, those overlapping DHSs (Stowaway, n = 3889; PIF/Harbinger, n = 4632) and those lacking DHSs (Stowaway, n = 44,736; PIF/Harbinger, n = 44,3234). For convenience, we refer to Stowaway and PIF/Harbinger elements that overlap DHSs as stowDHSs and harbDHSs, respectively. Approximately 19% of stowDHSs and harbDHSs were located in crossover hotspots (stowDHSs and harbDHSs in indica and japonica crossover hotspots versus control; Fisher’s exact test; P < 4.5e-05). To determine if colocalization with DHSs influences the distribution of recombination rates over individual transposons, MC simulations (10,000×) were used to construct empirical distributions of recombination rates for Stowaway and PIF/Harbinger elements lacking DHSs. Interestingly, stowDHSs (but not harbDHSs) were associated with significantly greater recombination rates in indica (empirical; Stowaway, P < 0.0015; PIF/Harbinger, P = 0.32), while both stowDHSs and harbDHSs in japonica were associated with elevated recombination rates (empirical; Stowaway, P < 0.0048; PIF/Harbinger, P < 0.0272; Figure 3D).
The inherent structure of gene promoters likely confounds the analysis of accessible chromatin and DNA transposons. Thus, we investigated the relationship among intergenic crossover hotspots (greater than 2 kb from any gene), Stowaway and PIF/Harbinger transposons, and chromatin accessibility. Intergenic crossover hotspots harboring at least one Stowaway or PIF/Harbinger transposon were associated with significantly greater chromatin accessibility for both subspecies (Wilcoxon rank sum test; both subspecies and transposon families, P < 5.1e-12; Figure 3E). Furthermore, counts of Stowaway and PIF/Harbinger elements were linearly associated with increasing chromatin accessibility in crossover hotspots from both genomes (Student’s t test; all combinations, P < 1.22e-5; Figure 3F). Since transposons are well-characterized targets for DNA methylation, we investigated the levels of various mDNA contexts for intergenic crossover hotspots with and without Stowaway and PIF/Harbinger elements. For both subspecies, the quantitative presence of Stowaway and PIF/Harbinger elements was associated with a significant reduction of mCG and mCHG levels (Student’s t test; both subspecies and transposon families, P < 9.59e-5), while no difference was observed for mCHH methylation (Student’s t test; both subspecies and transposon families, P > 0.11; Supplemental Figure 7). These results indicate that increasing occupancy of Stowaway and PIF/Harbinger elements in crossover hotspots is additively associated with increased chromatin accessibility, elevated recombination rates, and reduced mCG and mCHG in both rice subspecies.
Pervasive Genetic Divergence at Rice Crossover Hotspots
The discrepancy between crossover hotspot locations for two closely related rice subspecies led us to posit that subspecies-specific crossover hotspots may be associated with patterns of divergent evolution. Expectedly, crossover hotspots specific to each subspecies (indica, n = 11,189; japonica, n = 9944; shared, n = 3274) were affiliated with subspecies-specific recombination rates relative to shared hotspots (Figure 4A). To infer a potential relationship among subspecies-specific crossover hotspots and genetic differentiation, a measure of genetic population structure, the fixation index (FST), was estimated at all SNPs (n = 1,810,174) (Supplemental Figure 8). Crossover hotspots were associated with elevated FST compared with random and surrounding regions (Figure 4B). Enrichment of FST within crossover hotspots was statistically supported by MC simulations with random nearby regions (empirical; both, P < 1.0e-4; Supplemental Figure 9). We then compared FST values for SNPs located within indica and japonica crossover hotspots with those underlying shared crossover hotspots, revealing that indica-specific hotspots contained SNPs with relatively greater levels of FST (Wilcoxon rank sum test; P < 2.2e-16; Supplemental Figure 9). Although the same analysis with japonica-specific hotspots indicated higher FST compared with shared hotspots, the difference was not significant (Wilcoxon rank sum test; P = 0.10). This may reflect a more minor contribution of japonica crossover hotspots toward population differentiation as compared with indica. Due to the enrichment of Stowaway and PIF/Harbinger elements within crossover hotspots, we were interested to determine if these two MITEs specifically influence the distribution of population differentiation signals within crossover hotspots. To this end, the levels of FST were mapped onto 100-bp regions surrounding all classes of transposons. Stowaway and PIF/Harbinger elements were associated with greater levels of FST compared with other transposon classes, suggesting that these elements may generally contribute to genetic differentiation between indica and japonica (Supplemental Figure 9). Interestingly, mean FST values at crossover hotspots increased linearly with counts of Stowaway and PIF/Harbinger transposons (Student’s t test; all, P < 1.1e-4; Figure 4C). Thus, Stowaway and PIF/Harbinger transposons may quantitatively supplement population differentiation at crossover hotspots.
Nucleotide Diversity and Population Differentiation Are Associated with Subspecies-Specific Crossover Hotspots and DNA Transposons.
(A) Population-scaled recombination rates over indica-specific, japonica-specific, and shared crossover hotspots.
(B) Aggregate plot of FST values over indica-specific, japonica-specific, and shared crossover hotspots. Gray lines represent an equal number of random regions.
(C) Distributions of FST for indica and japonica crossover hotspots with different counts of Stowaway and PIF/Harbinger transposons. Red lines, fitted linear model.
(D) Normalized (by overall subspecies nucleotide diversity) π values over indica-specific, japonica-specific, and shared crossover hotspots.
(E) Nucleotide diversity for Stowaway and PIF/Harbinger transposons overlapping indica (orange lines) and japonica (blue lines) crossover hotspots. Purple lines indicate randomly selected Stowaway and PIF/Harbinger transposons (same number as the hotspot TEs). The gray lines reflect nucleotide diversity at random genomic regions (same number as the hotspot TEs).
To determine if genetic diversity relates to crossover hotspot evolution, we assessed the levels of nucleotide diversity (π) across 1-kb nonoverlapping windows in the rice genome for each subspecies (Supplemental Figure 10). Interestingly, we found that subspecies-specific crossover hotspots were enriched with subspecies-specific nucleotide diversity when compared with shared hotspots (nucleotide diversity was first scaled by the overall diversity in the subspecies; Wilcoxon rank sum test; all, P < 3.27e-16; Figure 4D; Supplemental Figure 11). Because crossover hotspots were enriched in gene regulatory regions, we posited that gene promoters targeted by hotspots may harbor a greater proportion of genetic diversity. Indeed, crossover hotspot-associated promoters were affiliated with greater genetic variation relative to nonhotspot promoters (both subspecies; Wilcoxon rank sum test; all, P < 2.2e-16; Supplemental Figure 11). Mapping of nucleotide diversity surrounding different classes of transposons uncovered the greatest levels of nucleotide diversity flanking Stowaway and PIF/Harbinger elements (Supplemental Figure 12). Overall genetic diversity in both subspecies surrounding Stowaway and PIF/Harbinger elements was further amplified when colocalized with crossover hotspots; hotspot-localized Stowaway and PIF/Harbinger elements exhibited greater levels of nucleotide diversity than randomly selected elements from the same families (Wilcoxon rank sum test; all, P < 2.2e-16; Figure 4E; Supplemental Figure 12). Overall, this analysis suggests that Stowaway and PIF/Harbinger transposons within crossover hotspots may augment patterns of divergence by amplifying nucleotide diversity and providing the substrate for genetic differentiation.
Subspecies-Specific Crossover Hotspots Are Associated with Lineage-Specific Variation
Lineage-specific SV likely contributes to species divergence. To query the potential contribution of subspecies-specific historical crossovers toward tandem duplication events, we performed intragenomic alignments for indica and japonica subspecies by leveraging 93-11 and IRGSP genome assemblies, resulting in the identification of 5603 and 3507 tandemly duplicated genes in the two subspecies, respectively. Comparing the positions of indica and japonica crossover hotspots with tandemly duplicated genes revealed a significant overlap at a frequency greater than chance for both subspecies, consistent with past reports (Fisher’s exact test; indica, P < 9.01e-80; japonica, P < 2.24e-28; Edlund and Normark, 1981). This enrichment was supported by the finding of proximity bias for crossover hotspots near tandemly duplicated compared with nonduplicated genes (Figure 5A). To determine if subspecies-specific hotspots drive tandem duplications of distinct gene sets, we profiled lineage-specific tandemly duplicated genes overlapping subspecies-specific crossover hotspots for enriched functional annotations. A total of 132 and 267 lineage-specific tandemly duplicated genes overlapped crossover hotspots in japonica and indica, respectively. Indica-specific tandemly duplicated crossover hotspot genes were significantly associated with functional annotations related to molecule binding, oxidative activity, protein kinase activity, and pollen-pistil interactions (Figure 5B). In contrast, japonica-specific tandemly duplicated hotspot genes exhibited enrichment for terms related to vesicle transport, apoptosis, and programmed cell death (Figure 5B). These results suggest that recombination is likely a major player driving the formation of lineage-specific genes with largely distinct functional roles across subspecies.
Subspecies-Specific Hotspots Are Associated with Sequence Variation and SV between Subspecies.
(A) Distribution of crossover hotspot distances relative to tandemly duplicated (left) or random genes (right).
(B) Enriched Gene Ontology terms for tandemly duplicated genes either overlapping indica (top) or japonica (bottom) lineage-specific crossover hotspots. FDR, false discovery rate.
(C) Circle plot depicting whole-genome alignment between 93-11 (indica; left) and Nipponbare (japonica; right). Gray ribbons connecting chromosomes from the two assemblies depict one-to-one alignments. Color ribbons represent translocated sequence relative to Nipponbare, colored by the matching 93-11 chromosome. Each track depicts the relative densities of various structural variants as well as crossover hotspot counts.
(D) Two-dimensional density plots illustrating the relationship among SV type and length (x axis) with recombination rate (y axis) for indica (top) and japonica (bottom).
(E) Normalized counts (per million) of indica and japonica hotspots in 1-kb windows flanking structural variants. The number of a given SV type in each subspecies is indicated at the bottom of each bar plot.
Next, we performed whole-genome alignments between published japonica and indica genome references, IRGSP v1 and 93-11, respectively, to determine if meiotic crossover hotspots contribute to large SVs (Figure 5C). We observed a strong negative correlation between the lengths of SVs and recombination rates in japonica (using IRGSP as a reference relative to 93-11), particularly for large (>200-nucleotide) duplications, insertions, and deletions, indicating that longer SVs are more likely to suppress recombination (Figure 5D). SV length had a larger negative effect on recombination rates from the indica map, indicating that longer variants are considerably disruptive to recombination in distant lineages (Figure 5D). Although suppressed within SVs, we speculated that regions flanking large SVs would be enriched with subspecies-specific crossover hotspots. Indeed, we found that 1-kb regions flanking SVs were associated with significantly greater counts of crossover hotspots from the other subspecies in a reciprocal fashion (e.g., greater normalized counts of indica to japonica hotspots within the 1-kb regions flanking a japonica SV; Wilcoxon rank sum test; all SV types, P < 1e-4; Figure 5E). This suggests that SVs can lead to hotspot suppression in one subspecies, thus contributing to the formation of subspecies-specific hotspots.
Stowaway and PIF/Harbinger Transposons Buffer GC Bias at Crossover Hotspots
Sequence divergence between subspecies at the single nucleotide level may influence crossover hotspot specificity. To evaluate this possibility, we scanned for motifs within subspecies-specific crossover hotspots that were significantly enriched relative to orthologous regions from the alternate lineage. Surprisingly, the top motifs in both subspecies were nearly identical, indicating that a core set of motifs underscores crossover hotspots in rice (Figure 6A; Supplemental Figures 13 and 14). Most of these motifs were highly repetitive, including a GGC 3-mer repeat reminiscent of the CCN motif identified in Arabidopsis and A/T homopolymeric sequences. Another top motif, TACTCCCTCCGTCCC (denoted as the TACT motif), corresponds to the terminal inverted repeat of the Stowaway family of DNA transposons (Figure 6A).
G/C Allelic Bias in Crossover Hotspots Is Stabilized by A/T-Rich DNA Transposons.
(A) Top five significant motifs underlying subspecies-specific crossover hotspots from discriminative discovery against orthologous regions.
(B) Average G/C allele frequency across 10-kb windows centered on indica (orange) and japonica (blue) crossover hotspots. The gray line represents randomly permuted nearby (10–1000 kb) genomic regions (n = 10,000).
(C) Two-dimensional density plot highlighting significant linear relationships between average G/C allele frequencies (x axis) and the population-scaled recombination rate (y axis) ρ/kb of hotspots for indica (top) and japonica (bottom) subspecies. White lines represent a fitted linear regression.
(D) Indica (left) and japonica (right) hotspots split into three equal-sized groups based on average G/C allele frequency. Top box plots indicate hotspot peak strengths from the low, intermediate, and high GC allele frequency groups. The bar plots (bottom) represent the proportion of hotspots from the three G/C allele frequency ranked groups containing at least one Stowaway or PIF/Harbinger element.
(E) Aggregate plot of G/C allele frequencies over Stowaway and PIF/Harbinger elements either overlapping indica (orange) or japonica (blue) hotspots. Gray lines are randomly selected Stowaway or PIF/Harbinger elements that did not overlap a hotspot. The same number of nonhotspot TEs was used as hotspot-associated TEs.
(F) A/T content (%) of Stowaway (left) and PIF/Harbinger (right) transposons split into 10 groups (dectiles) based on the underlying recombination rate for both subspecies (indica, left subpanel; japonica, right subpanel).
The presence of a motif with high G/C composition is consistent with the occurrence of G/C-biased gene conversion at regions with elevated recombination rates. Interestingly, for SNPs containing polymorphic G or C nucleotides, the frequency of the G/C allele is associated with a dramatic increase near crossover hotspot peaks compared with permutations of random regions (n = 10,000) in both subspecies (Figure 6B). To estimate the effect that G/C bias exhibits on crossover hotspot strength, we modeled peak recombination rates as a function of the mean G/C allele frequency of SNPs embedded within hotspots. A linear model identified average G/C allele frequency as significantly associated with hotspot peak recombination rates for both subspecies (Student’s t test; P < 0.02; Figure 6C). However, hotspots containing SNPs with intermediate allele frequencies appeared to be the strongest contributors to hotspot strength (Figure 6C). We found that elevated G/C allele frequencies within hotspots were subspecies specific such that lineage-specific hotspots were coincident with elevated G/C allele frequencies in the same population (Wilcoxon rank sum test; P < 2.2e-16; Supplemental Figure 15). In contrast, mean G/C allele frequencies were not significantly different for hotspots that were shared between indica and japonica. Assessment of subspecies-specific hotspots split by G/C allele frequencies revealed a pattern of enrichment with Stowaway and PIF/Harbinger transposons that mirrored hotspot activity (Figure 6D). Specifically, intermediate G/C allele frequencies were associated with crossover hotspots with the greatest activity (Kruskal-Wallis; all, P < 2.16e-55) as well as increased frequency of Stowaway and PIF/Harbinger transposons (Kruskal-Wallis; all, P < 1.17e-04; Figure 6D). This finding suggests that Stowaway and PIF/Harbinger transposons may stabilize allele frequencies, presumably by providing sequences rich in A/T nucleotides that may preserve or even enhance hotspot activity.
To further profile the relationship between G/C allele frequency dynamics and Stowaway and PIF/Harbinger elements at crossover hotspots, we surveyed allele frequencies for hotspots either containing or lacking at least one of these TEs. We found that the presence of Stowaway and PIF/Harbinger elements was associated with reduced variance in G/C allele frequencies (F test; all, P < 7.7e-06; Supplemental Figure 15). Then, to establish if reduced variation in G/C allele frequencies is a genome-wide characteristic of Stowaway and PIF/Harbinger transposons, we compared averaged G/C allele frequencies surrounding Stowaway and PIF/Harbinger transposons residing in and out of crossover hotspots. Colocalization with crossover hotspots substantially increases the G/C allele frequency for regions flanking Stowaway and PIF/Harbinger transposons (Figure 6E). However, we also observed increased G/C allele frequencies for regions flanking nonhotspot Stowaway and PIF/Harbinger elements, suggesting that these elements may be generally associated with regions subject to G/C bias, such as ectopic recombination, albeit at lower levels. Although Stowaway and PIF/Harbinger elements are embedded within regions exhibiting biased G/C allele frequencies, we observed a positive relationship between A/T content within Stowaway and PIF/Harbinger elements and recombination rates (Figure 6F). This result contrasts with mechanisms of G/C bias, since regions with elevated recombination are presumably associated with increased G/C content. Taken together, we conclude that subspecies-specific hotspots can drive changes in G/C allele frequencies within independent lineages and that Stowaway and PIF/Harbinger transposons are associated with highly active hotspots, stabilized G/C allele frequencies, and a greater proportion of DSB-favoring A/T sequences.
DISCUSSION
Here, we have provided a comprehensive characterization of the genomic, epigenomic, and population genetic factors influencing meiotic crossover hotspots and their effects on evolutionary divergence in two subspecies in rice using comparative genomic approaches. Crossover hotspot positions based on patterns of LD were concordant with empirically determined crossovers from three synthetic rice populations. Despite significant overlap with empirical data, compared with indica hotspots, we acknowledge that the lower proportion of overlap in japonica hotspots with RIL crossovers may be partially explained by several factors. First, the population constructed by Yu et al. (2011) utilized two indica parents, which likely resulted in indica-specific crossovers. Second, indica subspecies undergo greater numbers of recombination events per meiosis, and individual RIL lines may be under selection for increased recombination, which would bias for crossovers established by indica-specific factors. Third, the empirical crossover data were far from saturating, which decreases the power for estimating concordance with historical hotspots. Lastly, since we did not use the RIL individuals during historical map construction, crossovers from these populations may exhibit distinct crossover patterns compared with our subspecies diversity panels. Nevertheless, by leveraging a shared set of SNPs for hotspot identification in both subspecies, we show that hotspot positions are robust to differences in diversity. Furthermore, we demonstrate a strong association of crossover hotspots from both subspecies with features previously shown to co-occupy crossover hotspots, including regions upstream TSSs, accessible chromatin, SV, G/C-biased gene conversion, and DNA transposons (Myers et al., 2010; Choi et al., 2013; Arbeithuber et al., 2015; Marand et al., 2017). Taken together, these results provide a high degree of confidence in hotspot positions.
We previously established an enrichment of Stowaway transposons in crossovers from potato (Solanum tuberosum; Marand et al., 2017). However, the role of Stowaway transposons within meiotic crossovers has remained unclear. Here, we demonstrate that the presence of Stowaway and PIF/Harbinger elements in rice crossover hotspots is associated with a marked increase of peak recombination rates and was quantitatively associated with augmented chromatin accessibility and decreased DNA methylation in hotspots, two features observed in other plant species (Choi et al., 2013). We speculate that Stowaway and PIF/Harbinger transposons may facilitate crossover hotspot locations through a variety of indirect mechanisms. The target site duplication sequences for Stowaway and PIF/Harbinger elements correspond to TA and TAA, respectfully, where multiple excisions and insertions over time would result in long tracts of AT-rich sequences (Zhang et al., 2001; Feschotte et al., 2003). Regions rich in A/T sequences are more thermodynamically susceptible to DSBs, owing to weaker base pairing for A-T (two hydrogen bonds) compared with G-C (three hydrogen bonds). Additionally, inherently stiff homopolymeric sequences [poly(dA:dT) and poly(dC:dG)], motifs enriched within crossover hotspots in either subspecies, are known to exclude nucleosomes, thereby facilitating chromatin accessibility permissive for interactions with meiotic regulatory factors such as SPO11(Kaplan et al., 2009; Choi et al., 2018). The enrichment of SPO11 oligonucleotides at A/T-rich sequences and accessible chromatin regions further supports a role for fine-scale sequence content in determining the sites of DSBs (He et al., 2017; Choi et al., 2018).
Stowaway and Harbinger MITEs may also provide a direct mechanistic role in crossover formation. Studies of maize (Zea mays) transposons have revealed that nearly 25% of chromatin-accessible regions overlap transposons and likely represent young cis-regulatory elements (Zhao et al., 2018). Insertion of these elements to new sites could provide de novo cis-regulatory elements recognized by the DNA binding domains of meiotic trans-factors through exaptation of the transposon cis-regulatory sequences. Indeed, such a system to establish crossover hotspots is well recognized in humans, namely by PRDM9, which binds to a degenerate motif strikingly similar to sequences contained within THE1 and L2 retrotransposons (Myers et al., 2008). This provides an attractive paradigm to model crossover site determination: Stowaway and PIF/Harbinger transposons are highly enriched upstream and downstream of genes and depleted within gene bodies, thus minimizing the adverse effects associated with aberrant resolution of crossover intermediates. In contrast, the co-occurrence of Stowaway and PIF/Harbinger elements with crossover hotspots could be the result of a shared preference for open chromatin, a phenomenon that has been observed in maize (Liu et al., 2009). The propensity of these MITEs to localize in accessible chromatin may also be related to observations of upregulated TE transcription during meiotic prophase I in Arabidopsis (Chen et al., 2010; Yang et al., 2011), although the current lack of chromatin data specifically generated from meiocytes has precluded any firm conclusion arising from these data. Further experiments, such as CRISPR editing of individual Stowaway and PIF/Harbinger elements within hotspots, or probing transcription factor binding capacity of TE sequences, will be necessary to elucidate a potential mechanistic role of these transposons in crossover hotspot regulation.
The extent to which genetic variation and SV play a role in recombination rate variation and divergence has remained poorly understood in plants. Comparison of recombination rates between humans and chimpanzees revealed that large inversions result in marked decreases in rate correlations (Auton et al., 2012). We found that SVs were indeed associated with a broad decrease in recombination rates, regardless of the subspecies. However, we also found that subspecies-specific hotspots were highly enriched flanking SVs, which suggests that although SVs are depleted of recombination, hotspots may drive their initial formation. Consistent with reduced recombination in SVs, emerging evidence has implicated large SVs as a major contributor to speciation (Feulner and De-Kayne, 2017). We show that SV length demonstrates a marked correlation with reduced recombination rates. Moreover, recombination suppression within SVs was amplified when considering recombination rates from the alternate subspecies. Interestingly, DNA transposons were previously proposed to play a role in the formation of SV (Feschotte and Pritham, 2007). Experiments in rice have shown that genomic regions prone to insertions by DNA transposons intrinsically harbor substantial allelic variation (Wicker et al., 2016), consistent with our analysis of FST and nucleotide diversity flanking Stowaway and PIF/Harbinger transposons.
Analysis of the fine-scale sequence characteristics of human and chimpanzee crossover hotspots indicated distinct motifs within independent lineages (Auton et al., 2012). Furthermore, analysis of human hotspot motifs in the chimpanzee genome did not reveal an association with increased recombination rates (Auton et al., 2012). In contrast to humans and chimpanzees, motifs from lineage-specific crossover hotspots in rice were strikingly similar between subspecies, which implicates a core set of motifs underlying crossover hotspots in rice. The shared set of hotspot motifs in rice may also be related to the relatively short divergence time compared with humans and chimpanzees, which diverged between 6 and 13 million years ago (Patterson et al., 2006; Langergraber et al., 2012). Similar to hotspots in humans and chimpanzees (Munch et al., 2014), we found that crossover hotspots in rice are associated with increased G/C-bias allele frequencies. Overall elevated G/C allele frequencies at indica crossover hotspots implicate an older origin relative to japonica. Similar insights were gleaned from chimpanzee and human hotspots, revealing a more recent origin of hotspots in chimpanzees (Auton et al., 2012). How the rate of crossover hotspot evolution affects sequence divergence and population dynamics remained enigmatic in most species. We observed a strong association between hotspot strength and intermediate G/C frequencies, suggesting that A/T-rich sequences driving hotspots may be lost as G/C-rich alleles go to fixation. We speculate that the insertion of A/T-rich transposons, such as Stowaway and PIF/Harbinger elements, within high G/C-rich sequences may provide DSB-favorable sequences and an evolutionary avenue to offset the rampant increase in G/C-rich alleles due to G/C-bias gene conversion. This hypothesis is supported by reduced G/C allele frequency variance, elevated A/T sequence composition within transposons, and increased recombination rates at hotspot peaks containing Stowaway and PIF/Harbinger elements. Alternatively, elevated recombination at hotspots containing DNA transposons may be a consequence of ectopic recombination from illegitimate homology. Indeed, there is ample evidence that repetitive elements can facilitate aberrant recombination in both meiosis and mitosis (reviewed in Muñoz-López and García-Pérez, 2010). However, the probability of maintained ectopic recombination over the long generational time present in this analysis is somewhat less likely given the purifying selection properties affiliated with highly recombinant regions. Generation of Stowaway and PIF/Harbinger de novo insertion lines or a time-course study following multigenerational populations could shed light on a putative evolutionary mechanism for these DNA transposons. Collectively, our results provide insights into the genomic, epigenomic, and population genetic features associated with crossover hotspots as well as their collective evolutionary influence toward the divergence of two closely related subspecies of rice.
METHODS
Estimation of Fine-Scale Recombination Rate
A total of 75 random accessions from each rice (Oryza sativa) subspecies population (indica and japonica; Supplemental Data Set) was acquired from the National Center for Biotechnology Information (NCBI) BioProject PRJEB6180, corresponding to the 3000 Rice Genomes Project (Li et al., 2014). All reads were quality trimmed (-q 10) with cutadapt v1.16 (Martin, 2011) and mapped to the IRGSP-1 v7.0 reference genome (Kawahara et al., 2013) with the default settings of BWA mem v0.7.15 (Li and Durbin, 2009). Duplicate mapped reads and reads with mapping quality < 20 were filtered using picard tools v2.16.0 and samtools v0.1.19 (Li et al., 2009), respectively. Short nucleotide variants (SNPs and insertions/deletions [INDELs] < 50 bp) were identified using Freebayes v1.2.0 with default settings (Garrison and Marth, 2012). SNPs with mapping quality < 40 and INDELs were filtered. SNPs overlapping repetitive sequences or transposons, with minor allele frequency < 0.1 or with >10% missing genotypes, were removed to reduce the counts of false-positive SNPs. Imputation of missing genotypes and SNP phasing was performed with nondefault settings of Beagle v4.1 (window = 1000, overlap = 100, niterations = 25, gprobs = true, and err = 0.001; Browning and Browning, 2007). Recombination rates were estimated by treating phased haplotypes as haploid individuals within subsets of 2000 consecutive SNP windows, overlapping by 250 SNPs using the program Interval from the software package LDhat v2.2 (Auton and McVean, 2007). Interval was run using θ = 0.001 with 10 million iterations, reversible-jump Markov Chain Monte Carlo sampling every 2000 iterations, and discarding the first 500 samples as burn ins. A range of block penalties (5, 10, 20, 25, and 30) was used to assess fine-scale recombination rates in both subspecies (Supplemental Figure 1). We used recombination rate maps resulting from the block penalty of 25 simply because we could not detect strong differences outside of the expected smoothing of recombination rates. To validate the fine-scale recombination rate maps, two independently derived RIL populations were used to construct a consensus genetic map (Huang et al., 2009; Yu et al., 2011). Briefly, recombination bins from both RIL populations were dynamically merged and split such that the consensus recombination bin map was composed of recombination bins separated by a single crossover event stemming from one of the populations. For crossovers that overlapped between RIL populations, we take the site of the crossover as the center of the interval to avoid the scenario where flanking bins are separated by more than a single crossover. The resulting bins were used as genetic markers for genetic map construction using the RIL setting of MSTmap (Wu et al., 2008) and a significance threshold of –log10(P value) equal to 10. Recombination rates were estimated using the cubic spline function from the R package MareyMap (Rezvoy et al., 2007) and interpolated onto the physical positions of the consensus recombination bin map. To convert the population-scaled estimate of recombination rate ρ/kb to cM/Mb, we adopted a previously described strategy (Choi et al., 2013). Recombination rates from the japonica, indica, and consensus maps were compared in a pair-wise fashion using Spearman’s rank correlation. Unless noted otherwise, all remaining analyses of recombination rates utilized estimates of recombination as the population-scaled ρ/kb parameter.
Crossover Hotspot Analysis
We scanned for crossover hotspots in both subspecies using the software sequenceLDhot (Fearnhead, 2006). SequenceLDhot was run with nonoverlapping 1-kb windows, a 500-bp step size, and background estimates set to 50-kb regions centered on the tested 1-kb window. To exclude potential false positives, putative crossover hotspots with peak population-scaled recombination rates from LDhat < 1 ρ/kb were removed. Hotspot peaks that were less than 10 times the background or greater than 200 times the background were removed. An LRT threshold of 6 was used to filter crossover hotspots lacking strong statistical support. We then merged crossover hotspots if they were separated by 500 bp or less, taking the greatest LRT and recombination rate between merged crossover hotspots as the significance and recombination rate of the newly merged interval. Crossover hotspots were regarded as being between 1 and 5 kb in length, removing any hotspot outside of this range. Concordance with empirical crossovers was established by comparing the overlapping rate of crossover hotspots with 50-kb windows centered on recombination bin boundaries for the two RIL populations (Huang et al., 2009; Yu et al., 2011) and the 100-kb window hotspots as determined by Si et al. (2015). The intersect command from the BEDTools v2.26.0 suite (Quinlan and Hall, 2010) was used to assess the overlap (at least 1 bp) between crossover hotspots and various genomic features as defined by the IRGSP-1 v7.0 gene feature format file annotation and rice repetitive sequences. If more than one feature overlapped a crossover hotspot, then the crossover hotspot was counted toward each overlapping feature. Repetitive sequences and transposons were identified from IRGSP-1 v7.0 (Kawahara et al., 2013) and 93-11 (Yu et al., 2002) pseudomolecule fasta files using RepeatMasker (Smit and Green, 2015) and the Viridiplantae Repeat Database (Ouyang and Buell, 2004). Random nearby regions (10–1000 kb) were selected based on having no evidence of containing a crossover hotspot (LRT = 0) and similar GC percentage (within 5%), SNP density (within 5%), and sequence length to the matched crossover hotspot interval. Permutations comparing null (nearby random regions) distributions to specific intersections were run with a minimum of 10,000 replicates.
Epigenomic Data Analysis
DNase-seq reads from Nipponbare (Zhang et al., 2012) and 93-11 (C. Fang, W. Zhang, and J. Jiang, unpublished data) were trimmed of adapters and base pairs with quality scores < 20 while concurrently filtering reads with lengths of less than 18 nucleotides using cutadapt (Martin, 2011). Only DNase-seq reads aligning uniquely (mapping quality > 20) to both the IRGSP-1 v7.0 and 93-11 genome sequences were retained using bowtie2 (Langmead and Salzberg, 2012). A similar procedure was followed for all histone modifications [Nipponbare, H3K4me3 (He et al., 2010), H3K36me3 (Zhang et al., 2012), H3K9ac (He et al., 2010), H4K12ac (Zhang et al., 2012), and H3K27me3 (He et al., 2010); 93-11, H3K4me3 (He et al., 2010), H3K36me3 (C. Fang, W. Zhang, and J. Jiang, unpublished data), H3K9ac (He et al., 2010), H4K12ac (C. Fang, W. Zhang, and J. Jiang, unpublished data), and H3K27me3 (He et al., 2010)], with the exception that sequencing reads were required to be at least 25 nucleotides in length following read trimming. DHSs were identified using MACS2 (Zhang et al., 2008), optimized for short-read DNase-seq protocols (–extsize 200–shift–100). ChIP-seq peaks were identified using MACS2 with default parameters. Bisulfite-sequencing reads (Chodavarapu et al., 2012) were aligned using Bismark (Krueger and Andrews, 2011) with default parameters. The methylpy pipeline was used to filter bisulfite sequencing read alignments and to estimate methylation levels for the three DNA methylation contexts (Schultz et al., 2015). Weighted methylation levels were determined over 250-bp nonoverlapping windows (Schultz et al., 2012). Regions lacking bisulfite coverage were tagged as missing and excluded from all analysis. Coverage maps for chromatin data were normalized to read depths per one million mapped reads. Matrices for aggregate metaplots were generated using deepTools (Ramírez et al., 2016). We defined stowDHS and harbDHS as Stowaway and PIF/Harbinger elements that overlap a DHS by at least 1 bp, respectively. The ratio of reads overlapping peaks to the total mapped read count for each library was used to estimate the signal-to-noise ratio. Random region controls were matched to each crossover hotspot (similar length distribution), derived from flanking regions (10–1000 kb away), were mappable (determined from mapping-simulated 25-nucleotide reads from the reference sequence with default parameters of bowtie2), and displayed no statistical evidence of containing a crossover hotspot (output from sequenceLDhot). Unsupervised chromatin state clustering was performed using k means of 10-kb windows centered on crossover hotspots split into 50-nucleotide bins using all chromatin data sets simultaneously with the base R function, kmeans (R Development Core Team, 2019). The optimal k groups were determined via the silhouette method for hotspots and control regions, independently.
Population Genomics Analysis
Estimates of FST based on Weir and Cockerham interpretation were determined using vcftools v0.1.15 (Danecek et al., 2011) at each SNP (n = 1,810,174) using the 150 indica and japonica accessions separated by subspecies as distinct populations. Values of FST were plotted as 5-kb averages and further illustrated using a smoothing spline with the R package, smooth.spline, with the parameter spar set to 0.1 (R Development Core Team, 2019). MC simulations were conducted with 10,000 randomized replicates, excluding the associated crossover hotspot intervals. Calculations of π were determined through vcftools v0.1.15 (Danecek et al., 2011) using 1-kb nonoverlapping windows. To allow comparisons of indica and japonica sequence diversity, π was normalized by the mean sequence diversity of each subspecies, independently, as defined by Huang et al. (2012). For genome-wide comparisons (and illustration purposes; Supplemental Figure 10), normalized ratios of π for indica and japonica were averaged over 50-kb nonoverlapping windows and log2 transformed, where positive values indicate greater sequence diversity for indica and negative values represent greater sequence diversity for japonica. To allow comparisons of π between subspecies, we normalized π by the mean sequence diversity value for each subspecies (Huang et al., 2012), independently.
SV Analysis
Lists of tandemly duplicated genes and synteny maps were constructed using the SyNet pipeline with default parameters (Zhao et al., 2017). Gene Ontology term enrichment of tandemly duplicated genes overlapping crossover hotspots was performed using web-based software, AgriGO (Tian et al., 2017), using Fisher’s exact test and raw P values corrected via the Benjamini and Hochberg false discovery rate method. The positions of indica and japonica crossover hotspots on the 93-11 assembly were identified by aligning hotspots plus 5 kb of context sequence from IRGSP to the 93-11 reference with nucmer (default parameters) from the MUMmer4 release (Marçais et al., 2018). Alignments demonstrating one-to-one homology were retained (putative orthologous regions). Indica hotspots with more than one alignment were filtered to retain the alignment with the greatest coverage of the central hotspot peak. Approximately 98% of indica and japonica hotspots were mapped to 93-11 coordinates via this approach. To identify large SVs including INDELs, translocations, inversions, and duplications, we performed whole-genome alignments between the 93-11 and IRGSP genomes using nucmer, filtering alignments < 200 bp (delta-filter -l 200; Marçais et al., 2018). Structural variant breakpoints were identified using the show-diff utility from MUMmer4 (Marçais et al., 2018). To simplify our analysis, we focused on alignments between 93-11 and IRGSP where both sequences originated from pseudomolecules, effectively filtering unanchored contigs and scaffolds. For analysis of G/C allele frequencies surrounding crossover hotspots, we only considered SNPs where one of the polymorphic nucleotides was a G/C. Hotspots were centered on 10-kb windows and split into 250 equal-sized bins. Average G/C allele frequencies were computed for each bin using the utility map from BEDtools v2.26.0 (Quinlan and Hall, 2010).
Identification of Sequence Motifs from indica and japonica Crossover Hotspots
To discern sequence differences between indica and japonica crossover hotspots, we further processed the Nipponbare/93-11 hotspot alignments to retain regions with sequence identity > 50%, aligned at least 80% in both sequences, and on the identical chromosome. Alignments with greater than 10% of sequence composed of unambiguous nucleotide calls were removed. Hotspots that overlapped by at least 1 bp in either the IRGSP-1 v7.0 or 93-11 reference genome were removed, focusing the analysis of subspecies-specific hotspot sequences. We then trimmed the alignments to leave 250 bp of sequence on either side of the crossover hotspot peak. Hotspot peaks were identified prior to the alignment by mapping the LDhat-based recombination rates onto crossover hotspots and identifying the center for the SNP coordinates with the greatest recombination rate. We then identified peaks in the 93-11 background by scanning for a match of the 50-bp sequences surrounding the peak from IRGSP-1 v7.0, allowing up to 6% mismatch (no more than three mismatches) and no gaps. Hotspot alignments where peaks could not be identified in 93-11 were removed from the analysis. The top 1000 crossover hotspots ranked by peak recombination rates in each subspecies were selected to reduce computational cost. Each crossover hotspot was matched with a nearby region (10–1000 kb) with similar (within 5%) GC content, identical sequence length, and SNP density (within 5%). We ran MEME-chip (Machanick and Bailey, 2011) with discriminative discovery, searching for motifs between 6 and 20 bp in length, allowing for zero or one motif occurrence per sequence (ANR model) in each crossover hotspot. Null sequences were constructed with the random nearby regions in addition to the orthologous region from the other subspecies, for a total of four comparisons: (1) japonica-specific crossover hotspots with Nipponbare-based sequences against random nearby regions in Nipponbare; (2) japonica-specific crossover hotspots with Nipponbare-based sequences against the orthologous region in 93-11; (3) indica-specific crossover hotspots with 93-11-based sequences against random nearby regions in 93-11; and (4) indica-specific crossover hotspots with 93-11-based sequences against the orthologous region in Nipponbare. The positions of all motifs were determined using fimo from the MEME-suite tool (Machanick and Bailey, 2011). To control for multiple testing, motifs with q values (Benjamini-Hochberg false discovery rate-corrected P values) < 0.05 were removed from the analysis.
Accession Numbers
Whole-genome resequencing data for the 150 rice accessions were obtained from NCBI BioProject PRJEB6180. DNase-seq, H3K36me3, and H4K12ac data from Nipponbare were obtained from NCBI Gene Expression Omnibus accessions GSE26610 and GSE26734. H3K4me3, H3K9ac, and H3K27me3 for both Nipponbare and 93-11 were obtained from NCBI Gene Expression Omnibus accession GSE19602.
Supplemental Data
Supplemental Figure 1. Comparison of a range of block penalties applied to LDhat during recombination rate estimates Using SNP markers from the 75 indica accessions.
Supplemental Figure 2. Relatedness among the 150 rice accessions.
Supplemental Figure 3. Fine-scale recombination rates across the Rice genome for subpopulations of indica and japonica compared with counts of LTR transposons.
Supplemental Figure 4. Genome-wide recombination rate comparison between indica, japonica, and RIL consensus maps using the full set of SNP markers.
Supplemental Figure 5. Genome-wide recombination rate comparison between indica, japonica, and RIL consensus maps using a shared set of SNPs among indica and japonica accessions (n = 569,693).
Supplemental Figure 6. Genome-wide recombination rate comparison among indica and japonica using the shared (solid lines, n = 569,693) and full sets of SNPs (filled-in polygons).
Supplemental Figure 7. Recombination rates and DNA mthylation over transposons.
Supplemental Figure 8. Estimates of FST across the rice genome using indica and japonica individuals as subpopulations.
Supplemental Figure 9. Genome-Wide distribution of FST over hotspots and various transposon families.
Supplemental Figure 10. Normalized Log2 ratio of indica to japonica sequence diversity (π) over 50-kb nonoverlapping windows across the rice genome.
Supplemental Figure 11. Nucleotide diversity at crossover hotspots.
Supplemental Figure 12. Nucleotide diversity across transposons.
Supplemental Figure 13. DNA sequence motifs identified in indica crossover hotspots using random nearby regions, and the orthologous region in the Nipponbare IRGSP-1 v7.0 pseudomolecules (japonica) as controls.
Supplemental Figure 14. DNA sequence motifs identified in japonica crossover hotspots using random nearby regions, and the orthologous region in the 93-11 pseudomolecules (indica) as controls.
Supplemental Figure 15. Distribution of G/C allele frequencies at subspecies-specific and shared crossover hotspots.
Supplemental Table. Correlation between recombination rate maps.
Supplemental Data Set. Indica and japonica genotypes.
Dive Curated Terms
The following phenotypic, genotypic, and functional terms are of significance to the work described in this paper:
Acknowledgments
This research was partially supported by the National Science Foundation (Grants IOS-1444514 and MCB-1412948) and by University of Wisconsin Hatch funds and Michigan State University startup funds to J.J.
AUTHOR CONTRIBUTIONS
J.J. and A.P.M. designed the research. W.Z. performed the experiments. A.P.M., H.Z., Z.Z., C.F., and J.J. analyzed data. A.P.M. and J.J. wrote the article.
- Received October 3, 2018.
- Accepted January 28, 2019.
- Published January 31, 2019.