Endogenous, tissue-specific short interfering RNAs silence the chalcone synthase gene family in glycine max seed coats.

Two dominant alleles of the I locus in Glycine max silence nine chalcone synthase (CHS) genes to inhibit function of the flavonoid pathway in the seed coat. We describe here the intricacies of this naturally occurring silencing mechanism based on results from small RNA gel blots and high-throughput sequencing of small RNA populations. The two dominant alleles of the I locus encompass a 27-kb region containing two perfectly repeated and inverted clusters of three chalcone synthase genes (CHS1, CHS3, and CHS4). This structure silences the expression of all CHS genes, including CHS7 and CHS8, located on other chromosomes. The CHS short interfering RNAs (siRNAs) sequenced support a mechanism by which RNAs transcribed from the CHS inverted repeat form aberrant double-stranded RNAs that become substrates for dicer-like ribonuclease. The resulting primary siRNAs become guides that target the mRNAs of the nonlinked, highly expressed CHS7 and CHS8 genes, followed by subsequent amplification of CHS7 and CHS8 secondary siRNAs by RNA-dependent RNA polymerase. Most remarkably, this silencing mechanism occurs only in one tissue, the seed coat, as shown by the lack of CHS siRNAs in cotyledons and vegetative tissues. Thus, production of the trigger double-stranded RNA that initiates the process occurs in a specific tissue and represents an example of naturally occurring inhibition of a metabolic pathway by siRNAs in one tissue while allowing expression of the pathway and synthesis of valuable secondary metabolites in all other organs/tissues of the plant.


INTRODUCTION
Knowledge of the RNA silencing pathway in plants (also known as RNA interference) is now advanced (reviewed in Baulcombe, 2004;Matzke and Matzke, 2004;Zamore and Haley, 2005;Chapman and Carrington, 2007;Eamens et al., 2008;Ramachandran and Chen, 2008;Carthew and Sontheimer, 2009), but relatively few examples exist of regulation of a specific plant phenotype by naturally occurring variation in the pathway. The soybean (Glycine max) I (inhibitor) locus, an unusual cluster arrangement of chalcone synthase (CHS) genes that inhibits seed coat pigmentation, is one such example of a silencing locus (Todd and Vodkin, 1996;Tuteja et al., 2004) mediated through posttranscriptional RNA silencing that can be suppressed by a viral silencing suppressor protein (Senda et al., 2004). CHS is the first committed enzyme in the pathway to an extraordinarily diverse set of secondary products, including isoflavones in the seed cotyledons, defense compounds in the leaves, phenolic exudates of the roots, and anthocyanin pigments in the hypocotyls, trichomes, pods, and seed coats of certain genotypes. In this article, we report RNA analysis and highthroughput sequencing of small RNAs to detail that the biogenesis and accumulation of the CHS short interfering RNA (siRNA) silencing signal is limited to the seed coats of dominant I genotypes, thus explaining how the soybean plant can still express CHS transcripts required for the synthesis of secondary products in other tissues with I silencing genotypes.
In soybean, two dominant forms (I and i i ) of the I locus inhibit pigmentation of the seed coat in a spatial manner resulting in a colorless seed or light yellow on the entire seed coat (I allele) or yellow seed coat with pigmented hilum where the seed coat attaches to the pod (i i allele). By contrast, the homozygous recessive i allele allows for pigment production and accumulation over the entire epidermal layer of the seed coat. Most cultivated soybean varieties have been selected for a yellow, nonpigmented seed coat (homozygous I or i i alleles) to mitigate the undesirable effects of the black or brown anthocyanin pigments on protein and oil extractions during processing of soybean products (Palmer et al., 2004).
The I locus was initially identified as a region of duplicated and inverted CHS genes (CHS1, CHS3, and CHS4) (Todd and Vodkin, 1996) by analyzing a series of naturally occurring isogenic pairs that result from independently occurring mutations of the dominant silencing I allele to the recessive i allele (designated I / i mutations) or of the dominant silencing i i allele to the recessive i allele (designated i i /i mutations). Recently, in-depth BAC screening and sequence analyses revealed that five (CHS1, CHS3, CHS4, CHS5, and CHS9) of the nine nonidentical CHS gene family members are clustered in a 200-to 300-kb region Tuteja and Vodkin, 2008) in the cultivar Williams containing the i i allele. Three of these five genes, CHS1, CHS3, and CHS4, were revealed to occur as two 10.91-kb perfect, inverted repeat clusters separated by 5.87 kb of intervening sequence that define the I locus based on deletions in this region that occur in recessive i mutations. Based on BLAST searches to the recently assembled 8X soybean genome sequence at the Department of Energy Joint Genome Institute (http://www.phytozome.net/soybean), the clustered CHS region of the I locus maps to chromosome Gm8, while four other CHS family members, CHS2, CHS6, CHS7, and CHS8, reside in different chromosomes, Gm5, Gm9, Gm1, and Gm11, respectively.
The six contiguous CHS1-3-4 genes in the inverted repeat clusters lead to spontaneous deletions and truncations of CHS genes manifested as mutations of the I locus. Spontaneous mutations of the dominant, silencing I or i i alleles to the recessive i alleles involve deletion of CHS promoter sequences from CHS4 or CHS1, paradoxically resulting in increased CHS7/CHS8 transcripts and pigmented soybean seed coats (Todd and Vodkin, 1996;Tuteja et al., 2004). Silencing by the naturally occurring CHS clusters parallels cosuppression, a phenomenon first described in plants transformed with extra copies of CHS genes (Napoli et al., 1990; Van der Krol et al., 1990). Using RNA gel blotting, the presence of small RNAs of ;21 nucleotides was found by Senda et al. (2004) in another yellow seed coated variety (Toyohomare) with a dominant I allele, and nuclear run-on experiments implicated a posttranscriptional mechanism mediated by siRNAs. Of the other studies reported thus far on RNA silencing involving endogenous alleles that are composed of multiple genes arranged in inverted repeat orientations (Kusaba et al., 2003;Della Vedova et al., 2005), the soybean system is unique in that it triggers tissue-specific gene silencing .
The involvement of gene silencing characterized by the production of the 20-to 30-nucleotide small RNAs in the regulation of plant development is now a well-established occurrence (Carrington and Ambros, 2003;Allen et al., 2004). Small RNAs, particularly microRNAs (miRNAs), have been identified and implicated in a variety of physiological and morphological processes through computational and cloning approaches (Llave et al., 2002;Bartel, 2004;Jones-Rhoades and Bartel, 2004;Sunkar and Zhu, 2004;Lauter et al., 2005;Borsani et al., 2005;Chuck et al., 2009). Further insights into the small RNA regulatory mechanisms are elucidated through the power of deep sequencing of small RNA populations in animals, plants, fungi, and protozoa (Lu et al., 2005;Nobuta et al., 2008).
Here, we present results from both small RNA gel blots and deep sequencing of small RNA populations from several genotypes of soybean and demonstrate that the CHS siRNAs accumulated only in the yellow seed coats having either the dominant I or i i alleles and not in the pigmented seed coats with homozygous recessive i genotypes. However, the diagnostic CHS siRNAs did not accumulate in the cotyledons of genotypes with the dominant I or i i alleles, thus demonstrating the novelty of an endogenous inverted repeat driving RNA silencing in trans of nonlinked CHS family members in a tissue-specific manner. This system demonstrates a naturally occurring feature of small RNA biogenesis and accumulation not well defined in other endogenous silencing examples.
Since CHS is the first committed enzyme of the flavonoid pathway, the endogenous tissue-specific silencing phenomenon of the I locus leads to selective downregulation of the flavonoid pathway and pigment inhibition only in the seed coats of silencing genotypes, whereas the cotyledons continue to accumulate high levels of isoflavones, other products of the flavonoid pathway that are characteristic of soybean seed (Dhaubhadel et al., 2007). In vegetative tissues, the roots use the flavonoid pathway to produce phenolic compounds involved in symbiosis with Rhizobium and the soybean leaves induce CHS transcripts upon pathogen challenge (Zabala et al., 2006). Thus, the silencing I and i i alleles have economic value in that they inhibit the pigment in the seed coat, a desirable trait for soybean processing, yet they do not affect other essential functions of the flavonoid pathway in the cotyledons, leaves, and roots. The dominant alleles specifying yellow seed coat have been incorporated by breeders into the germplasm of all modern cultivated soybean varieties long before the mechanism of the locus was understood to be mediated by tissue-specific production of siRNAs.

CHS-Derived siRNAs Found in the Seed Coats of both I and i i Dominant Allele Genotypes
The classically defined I locus (inhibitor) is characterized by its four alleles: I, i i , i k , and i (in order of dominant to recessive forms) that affect the production and accumulation of anthocyanins and proanthocyanidins in a spatial manner in the soybean seed coat (Todd and Vodkin, 1993;Wang et al., 1994). The dominant I allele inhibits pigmentation over the entire seed coat, resulting in a light or yellow color on mature harvested seeds, whereas the i i and i k alleles restrict pigmentation to the hilum and saddle shaped regions, respectively. The homozygous recessive i allele allows for pigment production and accumulation in the epidermal layer of the seed coat, thus imparting a buff, brown, or black coloration depending upon other anthocyanin pathway alleles present (Palmer et al., 2004).
We investigated the presence of CHS-related siRNA species in seed coats of the nonpigmented (Richland, I), and hilumpigmented isoline (Williams, i i ) along with their corresponding mutant allele lines (T157, i and Williams 55, i) (Table 1) using RNA gel blotting. The siRNAs were visualized via RNA gel blots probed with an antisense, in vitro-transcribed CHS7 probe. CHS7 was chosen as the probe since the nearly identical CHS7 and CHS8 genes are downregulated by the silencing I locus (Senda et al., 2004;Tuteja et al., 2004). As shown in Figure 1, a strong hybridization signal between the 20-and 30-nucleotide RNA markers was detected in both Richland (I) and Williams (i i ) seed coat low molecular weight (LMW) RNA samples, while the RNA samples from the corresponding mutant isolines (T157, i and Williams 55, i) showed no evidence of CHS siRNAs. Thus, the presence of CHS siRNAs is limited to the yellow seed coat varieties with dominant I or i i genotypes, which demonstrates that the mechanism of the dominant alleles is mediated by the siRNA silencing pathway. These results also agree with those of Senda et al. (2004), wherein small RNAs were visualized in RNA gel blots of seed coats from a different yellow seed coat cultivar, Toyohomare, which carries the I allele.
Thus, both the dominant I allele and the dominant pattern form of the I locus, the i i allele typical of the cultivar Williams, result in silencing mediated by CHS siRNA production. The two lines used in our study, Richland (I) and Williams (i i ), are the sources of the I and i i alleles in many modern cultivated varieties. Williams is also the cultivar that has recently been sequenced by the Joint Genome Institute (http://www.phytozome.net/soybean).

CHS siRNAs Are Absent in the Cotyledons of Seeds with the Dominant I Genotype
We previously showed that the cytoplasmic CHS mRNA levels, while significantly lower in the seed coats of the yellow seeded varieties, did not show any reduction in the immature cotyledons dissected from the developing seed , thus predicting a tissue-specific silencing mechanism. Figure 2 shows that CHS siRNAs were again clearly detected in seed coats of Richland, the cultivar with the suppressive I allele, but not in the pigmented seed coats of T157 (i). More intriguingly, CHS siRNAs were not detected in cotyledons of either the yellow or the pigmented isolines. These results suggest that the CHS siRNA-mediated silencing of CHS expression in the immature soybean seeds is specific to the seed coat due to the absence of detectable CHS siRNAs in the cotyledons.
Highly Tissue-Specific Accumulation of CHS siRNA Conferred by the Dominant I and i i Alleles Our analysis of CHS-siRNAs was expanded to other tissues representing the vegetative parts of the plant. LMW RNA fractions from seed coats, cotyledons, roots, and leaves of the two isogenic pairs (Richland and T157 representing an I / i mutation and  LMW RNA samples (75 mg) were fractionated in a 15% polyacrylamide gel and probed with an antisense CHS7 riboprobe transcribed from a full-length CHS7 cDNA. Radiolabeled LMW RNAs from both the yellow seed coat varieties Richland (I, yellow) and Williams (i i , yellow seed coat with pigmented hilum) indicate the accumulation of CHS siRNA. By contrast, the LMW RNA fractions from the corresponding mutant isolines T157 (i) and Williams 55 (i) with pigmented seed coats lack CHS siRNA. Radiolabeled Decade markers (20 to 30 nucleotides) are shown at left and right.
Williams and Williams 55 representing an i i / i mutation) were separated on polyacrylamide gels and the RNA gel blots hybridized to the CHS7 antisense probe as described before. Figure 3 clearly shows that sense CHS siRNAs accumulated in the seed coats of both the nonpigmented Richland (I) and hilumonly pigmented Williams (i i ) cultivars. As shown before in Figure  1, no detectable hybridization to the CHS probe was observed in seed coats of their respective pigmented isolines, T157 (i) and Williams 55 (i). Intriguingly, no trace of CHS siRNAs was detected in the cotyledons, leaves, or roots of the yellow seeded cultivars Richland (I) and Williams (i i ). These results were in accordance with our previously published findings of exclusive tissue-specific effect on reduction of CHS mRNA transcript levels only in the seed coats of the yellow seeded cultivars and not in the cotyledons and vegetative tissues of the yellow seeded cultivars. Together, these data (Figures 1 to 3) demonstrate that there is a tissue-specific accumulation of CHS siRNAs only in the yellow seed coats and not in other tissues of the yellow seeded varieties.

High-Throughput Sequencing of Small RNA Populations from Seed Coats and Cotyledons
To ascertain and characterize the identity of the CHS siRNAs detected on gel blots, multiple small RNA libraries were generated and sequenced deeply by the Illumina high-throughput sequence by synthesis technology. Four small RNA libraries were sequenced (Table 1): seed coats from Richland (I, yellow seed coats), seed coats and cotyledons from Williams (i i , yellow seed coats with black hilum), and seed coats from mutant line Williams 55 (i, pigmented seed coats).
A total of almost 15 million sequence reads (14,904,022) were obtained from the four libraries. As shown in Table 1, three libraries were sequenced at the same time to approximately three million reads. The fourth library from the pigmented genotype of the Williams 55 line carrying the i mutation produced twice the number of reads (six million) as it was sequenced at a later date when the yield from the Illumina flow cells had increased. The raw sequence reads were processed computationally to remove adapter sequences and from this pool of processed sequences, unique signatures representing at least five reads within each library were identified and selected for further analyses. The number of unique signatures ranged from 28,000 to 92,000 per library ( Table 1). Normalization of the total counts of individual signatures was made based on three million raw reads.
BLASTn searches to the Sanger miRNA database (http:// microrna.sanger.ac.uk/) found relatively few, <1000, with matches to currently known and curated miRNAs, indicating that many represent siRNAs or previously unknown miRNAs. In this report, we focus on the CHS siRNAs found in the different tissues and genotypes that are active physiologically to effect a change in plant pigment phenotype.

Multiple CHS siRNAs Accumulate in the Seed Coats but Not the Cotyledons of the i i Yellow Seeded Genotype
To identify the CHS-derived siRNAs from these total small RNA populations, the unique sequence reads from each library were separately mapped to each of the five CHS containing BACs (77G7a, 56G2, 5A23, 28017, and 7C24). These BAC clones carry different members of the CHS multigene family and have been previously sequenced, annotated, and described in detail (Tuteja and Vodkin, 2008). Figure 4 and Supplemental Data Set 1 online demonstrate that out of a total of >500 kb of the soybean genome represented by these five BACs and spanning 91 predicted gene models, the small RNAs from the Williams seed coat versus cotyledon libraries mapped primarily to the coding regions of CHS with scattered matches to other reading frames and very few matches to the intergenic regions or introns. Excluding CHS genes, the highest numbers of siRNAs mapped to open reading frames with similarity to known retrotransposable elements. While BACs 5A23 and 28O17 have single copies of CHS7 and CHS8, respectively, BAC77G7a has eight CHS family members, six of which form the 27-kb-long inverted repeat region consisting of two clusters of three genes (CHS1-3-4 and CHS4-3-1) that define the i i allele. Numerous small RNAs with homology to the different CHS genes were found, but they were present only in the seed coat library of the hilum-pigmented, yellow seeded Williams (i i ) cultivar and not in the cotyledon library constructed from the same Williams (i i ) cultivar. Figure 4 illustrates the striking differences observed when both distribution and total counts of CHS siRNAs from the seed coat and cotyledon libraries of the hilum-pigmented yellow seeded Williams (i i ) cultivar were compared. Very high counts of ;25,000 LMW RNA fractions (75 mg) from seed coats and cotyledons of the soybean isogenic lines Richland (I, yellow seed coat) and T157 (i, pigmented seed coat) were separated on 15% polyacrylamide gels, and the resulting RNA gel blots were probed with an antisense CHS7 riboprobe transcribed from a full-length CHS7 cDNA. CHS siRNAs were detected only in the seed coats of Richland (I), the cultivar with the yellow seed coats (top panel). The bottom panel shows small RNAs stained with ethidium bromide.
CHS siRNAs that align to the individual CHS genes were observed for the highly similar CHS1-3-4 genes found in the inverted repeat CHS clusters of the silencing i i allele. However, only eight or fewer occurrences of CHS siRNAs were found in the cotyledon sequences, thereby providing unequivocal evidence that CHS siRNAs were found uniquely in the yellow seed coats in a tissue-specific manner.
Tissue-Specific CHS siRNAs That Silence CHS7 and CHS8 in the Dominant i i Genotype We previously showed by analysis of genetic deletions that the origin of the silencing I locus is the inverted CHS1-3-4 and CHS4-3-1 cluster region, whereas the target genes are primarily the nonlinked CHS7 and CHS8 genes  since CHS7 and CHS8 are highly expressed in the developing seed coats of the pigmented isolines that carry the homozygous recessive i mutation but are downregulated in the yellow Williams seed coats with (i i ) genotype. As shown in Figure 4, ;39,000 total CHS siRNAs map to the CHS7 and CHS8 genes that are located on separate chromosomes from the CHS1-3-4 and CHS4-3-1 cluster regions. Thus, there are large numbers of CHS siRNAs available to downregulate the target CHS7 and CHS8 mRNAs in the developing seed coats of the Williams (i i ) yellow seeded cultivar, but none were detected in the cotyledons of the same i i genotype.
The CHS multigene family has been divided into two subgroups on the basis of the degree of nucleotide identity in the open reading frames , and a phylogenetic tree has also been constructed previously (Matsumura et al., 2005). Supplemental Table 1 online summarizes the pairwise alignment of the nine CHS gene family members. CHS genes 1 through 6 grouped together, while CHS7 and CHS8 formed the second subgroup, with 82% similarity existing between the two groups. As much as 93 to 98% nucleotide sequence identity has been observed between CHS genes 1 through 6, with CHS6 being the most divergent member of this subgroup. The two members of the second subgroup, CHS7 and CHS8, are 97% identical. CHS9, a recently characterized member of this family exhibits greater homology to the first subgroup of CHS genes 1 through 6. Although very similar in sequence, multiple single or double nucleotide mutations distributed along the genes distinguish the family member genes, thus allowing their transcripts to be distinguished by quantitative real-time PCR .
Because the size of the target sequence influences the e value obtained from the BLAST algorithm and the BACs vary widely in size from 61,000 to >146,000 bases, we performed the BLAST analysis of each small RNA population to each of the nine individual CHS genes to attain an accurate, comparative number for CHS siRNAs aligning to the individual CHS genes.
All CHS genes contain one intron at the same position, and excluding their introns, the CHS genes are nearly identical in size at 1167 bases (CHS1-6 and CHS9) or 1170 bases (CHS7 and CHS8) from the ATG to the stop codon. Supplemental Data Set 2 and Supplemental Tables 2 and 3 online present the results. The number of unique signatures with 100% identity to each CHS gene in a pairwise comparison (see Supplemental Table 2 online) indicates that while CHS4 has 82% nucleotide similarity to both CHS7 or CHS8, only ;15% of the CHS siRNAs have 100% identity to both CHS4 and CHS7 or CHS4 and CHS8 (see Supplemental Table 3 online). Thus, we chose CHS7 and CHS4 as representative genes of each of the two CHS subgroups.
Specifically, Figure 5A illustrates the alignment of CHS siRNAs from the Williams (i i ) seed coat with 100% identity to CHS7. Overall, the CHS siRNAs aligned through almost the entire length of the CHS gene exons and not at all to the introns. In contrast with the large number of CHS siRNA sequences that aligned with exon 2, only a few sequences aligned with exon 1. The majority of CHS siRNAs aligned with exon 2 to form a bell-shaped curve against both the sense and antisense strands. Figure 5 shows only the alignment results of the CHS siRNAs with more than 50 occurrences. As shown in Table 2, the majority (976) of the total (1118) unique signatures had very few occurrences (5 to 50), while the remaining 13% (141) were represented many times (50 to 1000). Only 38 CHS siRNA unique species, including only three siRNAs with more than 50 counts, aligned with exon 1 of CHS7. None aligned with the intron, although some did appear to span the border, indicating that they arose from processed transcripts.
Since the frequency of each small RNA signature in the library generally reflects its relative abundance in the sample, the sequence repeats provide a quantitative expression measurement. Strikingly, of the 1118 unique siRNA signatures with perfect matches to CHS7 gene sequence, only 149 (13%) match perfectly to CHS4 gene sequence (see Supplemental Table 2 online). This finding illustrates that many of the siRNAs matching 100% to CHS7 originated from CHS7 (or the similar CHS8) transcripts after intron splicing, most likely as a result of amplification by RNA-dependent RNA polymerase (RdRP), dicer-like (DCL), and argonaute (AGO)-like effector complex that synthesize and cleave aberrant double-stranded RNA (dsRNA) into phased 21-to 22-nucleotide secondary siRNAs.
CHS8 shows a very similar alignment of the CHS siRNAs (Table 2), as expected from the high sequence similarity between CHS7 and CHS8 (97% similar). The siRNAs that aligned uniquely to CHS1, CHS3, and CHS4 are evidence that they originated from transcripts of the inverted repeat on chromosome Gm8 where those CHS genes reside. We propose that some of these siRNA signatures with perfect matches to genes in the CHS1-3-4 and CHS4-3-1 clusters represent the primary siRNA guides that trigger the silencing of all CHS genes.
The CHS7 and CHS8 sequence region that aligned with the largest number of siRNA signatures with very high counts must be the region most targeted by the primary siRNA-guided RNAinduced silencing complexes (RISC) ( Figure 5A, framed region). This portion of the sequence comprises the central region 748 bp of exon 2 (975 to 1281 bp relative to the initiation codon). Likewise, the alignments of the Williams (i i ) seed coat library CHS siRNAs to CHS1, CHS3, and CHS4 sequences of the inverted repeat also produced a similar alignment pattern (as shown in Figure 5B for the alignment of CHS siRNA with 100% identity to CHS4). This suggests that once silencing of all CHS genes is triggered by the CHS1-3-4 primary siRNA guides, a multitude of CHS siRNAs originating from any of the expressed CHS genes become guides to advance the targeting and posttranscriptional suppression of the entire CHS gene family.
The 21-Nucleotide siRNAs Are the Predominant Size Class of siRNAs with 100% Match to Individual CHS Genes Produced by endonucleolytic cleavage of dsRNA by different DCL-like orthologs and pathways, two major size classes of siRNAs, short (;21 nucleotides) and long (;24 nucleotides, have been detected in plants (Hamilton et al., 2002;Mallory et al., 2002;Tang et al., 2003). In our study, the sizes of the CHS siRNAs from the different libraries sequenced ranged primarily from 19 to 24 nucleotides. To determine the size class that dominated the different populations of CHS siRNAs with sequence identity to CHS siRNAs from the seed coat library of the yellow seed Williams (i i ) having 100% match to the CHS7 (or CHS4) gene sequences were mapped based on their alignments to specific locations. The intron and two exons of CHS7 and CHS4 are depicted, and the orientation of transcription is indicated with an arrowhead. The colored segments represent the number of occurrences in the library and the location of their alignment with CHS7 or CHS4 sequences. The "x" denotes one siRNA signature spanning the intron. Those aligning to the sense strand are above the gene, and those aligning to the antisense strand are denoted below the gene. Number of occurrences: 50 to 100 (light blue), 100 to 250 (dark blue), 250 to 500 (green), 500 to 750 (orange), and 750 to 1000 (red). The boxed-in region highlights the most targeted portion of exon 2. each one of the nine CHS genes, CHS siRNAs from the Williams (i i ) seed coat library with 100% matches were categorized into size classes and plotted against the number of unique CHS siRNA signatures ( Figure 6A) or total number of signature occurrences per CHS gene ( Figure 6B). Interestingly, both graphs affirmed that the most abundant CHS siRNA size class is the small 21 nucleotides, with as many as 700 unique signatures totaling >30,000 counts for those matching 100% to the CHS7 sequence. Based on the Arabidopsis model (Chapman and Carrington, 2007), these results suggest amplification by an RdRP6/DCL4 ortholog resulting in 21nucleotide secondary CHS siRNAs from CHS7/CHS8 mRNAs. Significantly, as illustrated in Figure 6B, the higher number of signature occurrences for genes CHS7 and CHS8 is in accordance with our earlier gene expression results of the individual family members. We had shown that the dominant i i allele executes its suppressive effect by inhibiting the accumulation of CHS7 and CHS8 transcripts. The increase in total CHS mRNA levels in the seed coats and consequential pigmentation of both the i i / i and I / i mutations was attributed to a 7-to 25-fold increase in the CHS7/CHS8 transcript levels .
The Dominant I Allele Also Produces Complex, Heterogeneous CHS siRNAs Sequencing of the small RNA population from the immature seed coats of Richland (I, yellow) at the same stage of development as those of the Williams cultivar (i i , yellow with pigmented hilum) also yielded close to three million raw sequence reads and >30,000 unique small RNAs. Large numbers of CHS siRNAs were found, agreeing with the blot data of Figures 1 to 3. The total number of CHS siRNAs found in the Richland population that map to each CHS gene is generally similar to that found for Williams (summarized in Table 3 from Supplemental Data Set 2 online). The CHS siRNAs from Richland also represented both strands and primarily mapped to exon 2. Some of the most abundant CHS siRNAs were the same tags as in Williams. For example, Table 4 shows some of the more abundant CHS siRNAs and the counts found in each library. Thus, both the dominant alleles (I and i i ) are effective in silencing the targeted CHS genes through production of a heterogeneous siRNA population that largely maps to both strands in the middle of exon 2 of the individual CHS gene family members.

The i i / i Mutation Abolishes CHS siRNAs from the Small RNA Population and Restores Pigmented Seed Coats
Structurally, the recessive i locus mutation in Williams 55 line is represented by a deletion that includes the CHS cluster B and extends into the promoter of CHS4 of cluster A (as illustrated in Figure 7). Examination of the number and distribution of CHS siRNAs in the seed coats of the two i i and i isogenic lines revealed the presence of a considerably higher number of CHS siRNA reads in the hilum-pigmented yellow seeded cultivar (Williams, i i ) relative to the black seed coat of Williams 55 (i) ( Table 3). In a Williams 55 seed coat library with >90,000 unique signatures and processed reads, only 16 different signatures totaling around 108 molecules per three million raw sequence reads mapped to any of the individual CHS gene family members. These few constitute only 0.03% of the number of CHS siRNAs found in the Williams (i i , yellow with pigmented hilum) and 0.06% of the CHS siRNAs found in Richland (I, yellow). The presence of large numbers of CHS-specific siRNAs in the seed coats of the dominant I and i i cultivars and not in the recessive i genotype is clear evidence of a suppressive effect of the inverted repeat I locus in soybean, which is mediated by a siRNA silencing pathway. Coupled with the small RNA gel blots of Figures 1 to 3, these results confirm that the naturally occurring deletions in the CHS genes at the I locus that accompany i i / i and I / i mutations (Todd and Vodkin, 1996;Tuteja et al., 2004) serve to abolish the production of the small RNAs from the dominant forms of the I locus.

An Endogenous, Inverted Repeat, CHS-Coding Region Generates Heterogeneous CHS siRNAs
Sequencing small RNAs to a depth of three million reads from the seed coats of two silencing alleles of the I locus (I, yellow seed coats, and i i , yellow seed coats with pigmented hilum) provided a wealth of data from which to determine the CHS-specific forms produced by these alleles. Based on alignments of these sequence signatures to the different CHS genomic sequences ( Figure 4) and on data from RNA gel blots (Figures 1 to 3), we conclude that CHS small RNAs were generated and accumulated exclusively in seed coats of yellow seed soybeans with the dominant allele genotypes I and i i and not in cotyledons. The CHS siRNAs found in the seed coats ranged in size from 19 to 24 nucleotides, although the small 21-nucleotide species was the most abundant size class. These sequences aligned to both the sense and antisense strands of exons 1 and 2 of all the CHS gene sequences (CHS1 to CHS9). For all CHS genes, the numbers of signatures aligning to the sense strand were somewhat higher than to the antisense strand (Table 2). Many more siRNAs aligned to exon 2 than to exon 1. In exon 2, the larger numbers of unique signatures as well as the larger number of counts per signature were concentrated to the central sequence portion ( Figure 5). Because there is sequence variation among the nine CHS genes, we were able to identify CHS siRNA signatures with 100% sequence identity to individual genes, thereby proving that the CHS siRNAs originated from transcripts of more than one CHS family member. The silencing i i allele contains six genes CHS1-3-4 and CHS4-3-1 as two perfectly repeated and inverted 10.91-kb clusters separated by a 5.87-kb intervening region as sequenced in two individual BACs (77G7-a and 104J7) Tuteja and Vodkin, 2008). PCR indicates a similar clustered structure is also present in the Richland I allele . A number of studies have shown that inverted repeats delivered as transgenes to induce RNA interference are particularly potent silencers of gene expression (Muskens et al., 2000;Smith et al., 2000;Kusaba et al., 2003), and naturally occurring inverted gene duplications that may produce siRNAs are speculated to be the evolutionary progenitors of miRNAs (Allen et al., 2004). In animal  systems, small RNA populations sequenced from mouse oocytes revealed the existence of endogenous primary siRNAs speculated to originate from naturally formed dsRNAs from pseudogene loci that contained inverted repeat structures (Tam et al., 2008;Watanabe et al., 2008). In the soybean system, the primary CHS siRNAs are derived from a cluster of functional CHS genes rather than pseudogenes. We have previously shown that CHS family members are transcribed in various tissues and in response to pathogen attack Zabala et et al., 2006). These transcripts are likely to translate into functional proteins since six different CHS isomers have been identified in elicitor-treated soybean cell suspension cultures (Grab et al., 1985). It is not difficult to envision that long transcripts initiated from one promoter within the two clusters of the i i allele and that span very similar CHS genes in inverse orientation could fold and create aberrant dsRNAs that are then subjected to dicer-like enzyme complexes and resulting in 20-to 22-nucleotide primary CHS siRNAs. In Arabidopsis and other systems, different DCL ribonucleases orthologs (1 to 4) process the dsRNA into different siRNA size classes depending on their catalytic properties (Hamilton et al., 2002;Tang et al., 2003). DCL3 processes the dsRNA into 24-nucleotide siRNAs, while DCL4 processes dsRNAs into 21-nucleotide siRNAs. Even though not much is known about the DCL ribonucleases or the protein complexes that cleave these aberrant dsRNA structures in soybean, one can anticipate that based on the tight range of CHS siRNA sizes found in the seed coat siRNA population ( Figure 6) that a DCL4 ortholog could be cleaving the aberrant dsRNAs in soybean.
Amplification of the Silencing Signal through the Action of an RNA-Dependent RNA Polymerase Is Deduced from the Specific CHS siRNA Sequences in the Seed Coat It can be reasoned that the primary CHS siRNAs resulting from the cleavage of the CHS dsRNA (formed from the CHS1-3-4 and CHS4-3-1 cluster region) trigger sequence-specific degradation of CHS7/CHS8 transcripts, explaining the observed 7-to 25-fold decrease in their expression levels in the yellow seed coats containing I and i i alleles . These cleaved CHS7 and CHS8 mRNAs may in turn be substrates for further RdRP and DCL activity. After cleavage at the mRNA site targeted by the primary CHS siRNA guide, an RdRP could synthesize a complementary copy of the cleaved CHS7 or CHS8 mRNA, thus generating additional aberrant dsRNAs that are processed into secondary 21-to 22-nucleotide CHS siRNAs by dicer activity. These secondary CHS siRNAs would then fan out as multiple guides of AGO-RISC-like complexes that could target additional CHS mRNAs, amplifying the silencing response as well as spreading it over a larger region of the CHS mRNAs. The targeting of the CHS mRNAs by both the primary and secondary siRNA guides must also take place after intron splicing since no CHS siRNA signatures aligning to intron sequences were found in any of the small RNA populations examined (Table 2). A few of the CHS siRNA signatures are split by the intron when aligned to the CHS genomic sequences ( Figure 5A), again confirming that they originate after intron splicing.
In summary, the pathway for CHS siRNA generation and accumulation must account for (1) the large number and distribution of unique siRNA signatures detected; (2) the range of siRNA sizes (20 to 22 nucleotides) with the small 21-nucleotide species being the most abundant, particularly for CHS7 and CHS8; (3) the lack of siRNAs derived from intron sequences; (4) the existence of siRNA with 100% sequence identity to genes CHS7 or CHS8 and lower similarity to genes CHS1, CHS3, and CHS4 and vice versa; (5) silencing of all nine nonidentical CHS gene coding regions, including those linked and those not linked to the long inverted repeat in Chromosome Gm8; and (6), the derivation of CHS7-and CHS8-specific siRNAs from both strands. The latter implies that such siRNAs are processed from dsRNA substrates produced by the pairing of sense transcripts with antisense copies derived from RdRP action. Figure 7 depicts the plausible succession of these steps schematically. Data are from Supplemental Data Set 2 online. a Asterisk represents 100% match to exon 2 of the indicated CHS gene, and no asterisk denotes a single base mismatch of the CHS siRNA sequence to the indicated CHS gene. Blanks indicate more than one mismatch; + strand direction is the coding direction for all CHS genes.
As with the transacting-siRNAs of Arabidopsis (Yoshikawa et al., 2005;Chapman and Carrington, 2007), we found that there is a certain degree of phasing in the CHS-siRNAs (Figure 5), putatively as a result of periodic dicing of double-stranded CHS mRNA. We presume that the imprecise phasing observed in this case may be due to multiple initiation sites on the CHS mRNAs targeted by the primary CHS-siRNA guides originating at the I locus.
Tissue-Specific Biogenesis of the CHS siRNAs from the Inverted Repeat I Locus Clusters in Seed Coats Is More Plausible Than Lack of Signal Amplification in Other Tissues More importantly, the results from this study present unequivocal evidence for the existence of an additional feature in siRNA regulation not described previously, a tissue specificity of en-dogenous siRNA generation from a cluster of genes that expresses normal mRNA transcripts in other tissue and organ systems. Several hypotheses can be put forward to explain the presence of CHS siRNAs in only one tissue, the seed coat. One possibility is that a cell or tissue-specific transcription factor in association with the structural peculiarities of the I locus could determine the seed coat-specific nature of CHS silencing. Previous expression studies of other genes in the anthocyanin pathway, such as flavonoid 39 hydroxylase (F39H), flavonone 3-hydroxylase (F3H), and flavonoid 39,59-hydroxylase (F395'H), have also shown tissue-specific expression in the seed coat for some of the family members (Zabala and Vodkin, 2003. Thus, a transcription factor (or a distantly located effector gene) could be regulating specific branches of the flavonoid pathway and possibly many other developmental pathways of the seed coat in a highly specific manner. Seed phenotypes are indicated for W = Williams (i i , hilum-only pigmented seed coat) and the isogenic mutant line W55 = Williams 55 (i, black seed coat). The presence of an exact, base-by-base duplication of the 10.91-kb CHS clusters A and B at the I locus as revealed by BAC sequencing of the yellow genotype (i i ) is diagrammed, as is the deletion in the i mutation. Marked by green Xs, the deletion encompasses regions flanking CHS cluster B and extends into the promoter region, including the HindIII (H3) site of CHS4 in cluster A. RFLP analysis also shows absence of the 2.3-kb HindIII fragment corresponding to CHS4 genes in the pigmented genotype (i). (Summarized from Todd and Vodkin, 1996;Tuteja et al., 2004). The molecular events supported by the CHS-siRNA data presented in this report are diagrammed. A dsRNA generated from the inverted CHS repeats in the seed coat is cleaved into primary siRNAs representing both strands that are amplified by RdRP to generate secondary CHS siRNAs capable of downregulating all members of the CHS gene family, including the more distantly related CHS7 and CHS8 (denoted in red). These two genes are highly expressed in the pigmented seed coats in which CHS siRNA production has been abolished by the deletion in the mutant i allele (W55). Production of the primary CHS siRNAs is tissue specific, found only in the seed coats and not in other tissues of the yellow seeded (i i ) genotype.
Conversely, the primary CHS siRNAs could potentially be generated from a dsRNA molecule produced in all tissues, but possibly they are not being amplified to detectable levels for lack of an RdRP enzyme in other tissues. RdRPs are involved in RNA amplification of primary siRNAs and generate more dsRNAs that are subsequently processed into the secondary siRNAs (Zamore and Haley, 2005;Chapman and Carrington, 2007). However, the lack of an RdRP function in so many different soybean tissues is implausible. As shown in Table 1, the cotyledon produces roughly the same number of 27,000 unique small RNAs as the seed coat libraries, although the cotyledon possesses only a handful of CHS siRNA molecules (Figure 4, Table 3). The distribution of non-CHS small RNAs that map to non-CHS coding regions is approximately the same in the Williams seed coat and the cotyledon. One of those signatures has over 11,000 occurrences that match to an long terminal repeat retrotransposon reverse transcriptase adjacent to CHS7 on BAC5A23 ( Figure 4). Additionally, another matches near the coding region for a gene with unknown function between the two CHS4 inverted repeats of clusters A and B on BAC77G7a. Thus, the cotyledon is clearly capable of amplifying other non-CHS siRNAs. However, in the absence of CHS siRNAs, the soybean cotyledon continues to synthesize CHS7 and CHS8 mRNA transcripts in later stages of development, which result in accumulation of isoflavones and other flavonoid products in the soybean cotyledon. Thus, in contrast with the downregulation of the pathway in the seed coats by CHS siRNA-targeted destruction of CHS7 and CHS8 mRNAs in the yellow seed coats, the CHS7 and CHS8 transcripts continue to increase during cotyledon development, leading to the accumulation of large amounts of isoflavones in the mature soybean seed even in yellow seed coat varieties with the dominant I or i i alleles. This system represents a targeted regulation of the flavonoid pathway in a specific tissue.
Likewise, we have sequenced libraries from other tissue and organ systems, including leaves and stems that also produce large numbers of small RNAs but only a handful of CHS-specific siRNAs similar to the very low percentages shown for the cotyledon library in Table 3. We have previously demonstrated that CHS transcripts in the leaves of Williams (i i ), including those for CHS1, 3, 6, 7, and 8 in soybean leaves, are induced >1000fold within 8 h after infection with the bacterial pathogen Pseudomonas syringae (Zabala et al., 2006). The induction of CHS transcripts would provide ample targets for RdRP amplification of a very low abundance CHS-siRNA silencing signal, should one exist in the pathogen challenged leaves of the Williams (i i ) genotype. However, posttranscriptional downregulation of CHS transcripts does not occur and the CHS mRNAs are highly expressed. These data reinforce that the tissue-specific nature of the I locus-mediated silencing effect is likely the tissue-specific biogenesis of the dsRNA and primary CHS siRNAs in the seed coats rather than failure to amplify secondary CHS siRNAs in other tissues.
The CHS siRNAs Are Not Transported from the Seed Coat to the Developing Cotyledons or Other Tissues Systemic RNA silencing has been observed in plants, fungi, and in Caenorhabditis elegans (Voinnet et al., 1998;Winston et al., 2002;Mallory et al., 2003;Timmons et al., 2003). In plants, the cell-to-cell and systemic spread of some classes of small RNAs is considered to occur through plasmodesmata (Voinnet et al., 1998;Lucas et al., 2001;Himber et al., 2003;Lucas and Lee, 2004) and the phloem (Palauqui et al., 1997;Klahre et al., 2002;Mallory et al., 2003), respectively.
The soybean seed coat, derived from the maternal ovular integuments, encloses the filial tissues (the embryo and the cotyledons) and includes two vascular bundles (the phloem and xylem elements) at the hilum, the point of attachment to the pod (Thorne, 1981). The phloem conduit, comprising the sieve tube system, functions in the long-distance transport of nutrients by pressure-driven bulk flow of the translocation stream and thus provides for storage product accumulation in the cotyledons. The symplasmic discontinuity between the maternal and filial tissues in the soybean seeds necessitates an apoplasmic exchange localized to the maternal/filial interface (Thorne, 1981). In our system, there is currently no evidence for the active transfer of the CHS siRNAs generated in the immature seed coat to other tissues. This could be explained simply that the seed coat is an end point of phloem transport and is not likely able to transport siRNAs backward from the seed coat to other vegetative tissues. The seed coat obviously is a conduit for nutrients from the vegetative tissues of the plant to the developing seed cotyledon that it encloses; yet there is no evidence of transfer of the CHS siRNAs through the seed coat to the cotyledon underneath since they do not accumulate in the cotyledons.

Regulation of an Important Pathway by Tissue-Specific siRNA Biogenesis
To summarize, we have described an endogenous inverted repeat system in soybean that drives silencing of CHS genes in a tissue-specific manner, thereby inhibiting pigmentation of the seed coats. We present clear evidence that a large number of siRNAs with sequences identical to exons 1 and 2 of multiple members of the CHS gene family accumulated in the seed coats of soybean cultivars with dominant I or i i alleles in a tissuespecific manner. The tissue-specific nature of the CHS siRNAs biogenesis adds another layer of complexity to the mechanisms of posttranscriptional regulation. Further study of this system should provide insight into the mechanism of tissue-specific gene silencing, which could be of practical use to target silencing to a restricted tissue or cell type.
While much emphasis has been placed to date on the evolutionarily ancient and highly conserved miRNAs, examples of siRNAs more uniquely tied to a particular species are likely to arise. As illustrated by the CHS siRNA system, expansion of duplicate genes can potentially spawn a unique regulatory system in a physiological process during natural selection and evolution or during domestication of a plant species. Thus, siRNA regulation could be an important addition to our knowledge of plant allelic diversity and short-term evolutionary mechanisms. Allen et al. (2004) have presented evidence that miRNAs have diverged from inverted gene duplications and represent older remnants of such events that once produced siRNAs.
The small RNA sequencing populations from the seed coat and cotyledons have revealed a vast number of additional small RNAs (miRNAs or siRNAs) varying greatly in normalized sequence counts. Many have much higher occurrence than the CHS siRNAs characterized here and some also show tissue specificity. We have clearly shown that the CHS siRNAs are physiologically functional to downregulate a pathway and produce a visible trait difference, lack of seed coat pigmentation. Thus, we anticipate that continued investigation of the novel sequences revealed in these populations will lead to similar examples of regulation of other pathways in seed development as demonstrated here for the CHS siRNAs.

Plant Materials and Genetic Nomenclature
The two isoline pairs of Glycine max used for this study were obtained from the USDA Soybean Germplasm Collections (Department of Crop Sciences, USDA/Agricultural Research Service University of Illinois, Urbana, IL). The genotypes of the four lines are described in Table 1. All lines are homozygous for the loci indicated, and only one of the alleles is shown for brevity in the tables and text.
Plants were grown in the greenhouse and tissues harvested from at least four plants of each isoline. Leaves and roots were harvested from 4-week-old plants and quick frozen in liquid nitrogen. Seed coats and cotyledons were dissected from seeds at varying stages of development based on the fresh weight of the entire seed: 10 to 25 mg, 25 to 50 mg, 50 to 75 mg, 75 to 100 mg, and 100 to 200 mg. Dissected seed coats and cotyledons from seeds of the 50 to 75 mg weight range were fast frozen in liquid nitrogen. All tissues were stored at 2708C till further use.

Small RNA Extraction and Gel Blot Analysis
LMW RNAs were isolated and probed as described previously (Hamilton and Baulcombe, 1999) with minor modifications. Total nucleic acids were extracted from the frozen seed coats, cotyledons, leaves, and roots of the two isogenic pairs using the standard phenol chloroform method (Todd and Vodkin, 1996) and precipitated with ethanol. Seed coats of the Williams 55 isoline produce procyanidins and were pretreated with proanthocyanidin binding buffer using the protocol of Wang et al. (1994), before extracting the total nucleic acids.
To the precipitate dissolved in water, polyethylene glycol (molecular weight 8000) and sodium chloride were added to a final concentration of 5% and 0.5 M, respectively, followed by incubation on ice for 30 min. High molecular weight nucleic acids were precipitated by centrifugation at 11,000 rpm for 20 min, while the LMW nucleic acids in the supernatant were recovered by ethanol precipitation at 2208C overnight. LMW RNA concentrations were measured on the NanoDrop ND1000 spectrophotometer (Nanodrop Technologies) and samples stored at 2708C until further use. For diagnostic purposes, the LMW RNA fractions were separated on a 1.2% agarose/3% formaldehyde gel and stained with ethidium bromide. The predominant stainable species of these gels was a band that runs at ;200 bp.
For accurate sizing of the siRNA species, an RNA ladder (10 to 150 nucleotides) was used and radiolabeled with [g-32 P]dATP following the protocol provided with the Decade Markers Kit from Ambion. In the case of the RNA gel blot shown in Figure 2, 50 pmoles of two sense DNA oligonucleotides, a 20-mer (CHS7RT-1F), and a 25-mer (CHS7RT-si25) corresponding to a region in the second exon of CHS7 were also run on the same gel (data not shown).
The CHS antisense riboprobe used for LMW RNA analysis was transcribed in vitro from the T7 promoter of a BamHI cleaved CHS7 EST, AI437793, by means of the MAXIscript In Vitro Transcription Kit (Ambion). AI437793 contains the full-length CHS7 open reading frame. Riboprobes were treated with RNase free DNase to remove the DNA template, and the 20 mL probe was hydrolyzed to an average size of 50 nucleotides with 300 mL of 0.2 M carbonate buffer (0.08 M NaHCO 3 and 0.120 M Na 2 CO 3 ) by incubating at 608C for 3 h. Subsequently, 20 mL of 3M NaOAc, pH 5.0, was added to the hydrolyzed probe before adding the probe to the hybridization solution.
The 5S rRNA oligoprobe was used as a loading control. A 27-mer oligo (59-GGTGCATTAGTGCTGGTATGATCGCAC-39) antisense to the soybean 5S rRNA encoding gene was g-radiolabeled using the DNA 59 End-Labeling System (Promega) according to the manufacturer's instructions. Unincorporated nucleotides were removed using BioSpin 6 chromatography columns (Bio-Rad).

Sequencing of Small RNA Libraries and Data Analysis
Gel purification, cloning, and sequencing of small RNAs from multiple tissue samples (seed coats and cotyledons of Williams [i i ], seed coats of Williams 55 [i], and seed coats of Richland [I]) were performed at Illumina using the SBS (sequencing by synthesis) technology. Briefly, 2.5 to 5 mg of the purified LMW RNA fraction of each of the four samples was provided to Illumina, which subsequent to quality checks, was separated on 15% polyacrylamide gels containing 7 M urea in TBE buffer (45 mM Tris-borate, pH 8.0, and 1.0 mM EDTA). A gel slice containing RNAs of 15 to 35 nucleotides was excised and eluted. Gel-purified small RNAs were ligated to the 39 adapter (59-TCGTATGCCGTCTTCTGCTTG-39), and the small RNA libraries sequenced using the Illumina Genetic Analyzer. Sequence information was extracted from the image files with the Illumina Firecrest and Bustard applications.
A total of three to six million reads that were 33 bases long were obtained from the deep sequencing of the above-mentioned libraries. Adapter trimming was performed using the first occurrences of substring TCG as the unique identifier for the beginning of the adapter (59-TCGTATGCCGTCTTCTGCTTG-39). The sizes of the small RNAs after adapter trimming ranged from 14 to 33 nucleotides, with the majority in the range of 19 to 24 nucleotides. Adapter trimmed sequences were compared to obtain the number of unique sequences and occurrences of each. At this stage, all sequences present more than five times were carried forward for subsequent comparisons.
Alignments of these curated small RNAs to each individual BAC sequence were made using BLAST (Altschul et al., 1990) with minimum match length of 16 bases with no mismatches or 20 bases with one mismatch allowed. Also, alignments were made to individual CHS sequences with at least 14 bases with no mismatches or 18 bases with one mismatch allowed. For the alignments to individual CHS sequences, the variable length intron was omitted so that the CHS protein coding regions would be in maximum alignment throughout their 1167 bases (for CHS1-6 and CHS9) and 1170 bases (for CHS7 and CHS8). A total of 200 bases from the genomic sequence 59 of the ATG start codon and 200 bases 39 of the stop codon of each gene were taken to represent the flanking regions, which brings the sequences to 1567 or 1570 nucleotides. The results from BLAST analyses were further characterized, cross-compared, and scrutinized with Excel tools. In some instances detailed alignments were performed with the MultAline program (http://bioinfo.genotoul.fr/multalin/ multalin.html).

Accession Numbers
Sequence data used in this article can be found in the GenBank/EMBL databases under the following accession numbers: EF623854, EF623856, EF623857, EF623858, and EF623859, corresponding to the five CHS containing BACs, 77G7a, 56G2, 5A23, 28017, and 7C24, respectively (Tuteja and Vodkin, 2008). The sequences for the CHS family member genes were extracted from these BAC clone sequences, except for CHS2, which had accession number X65636. The accession number for the soybean 5S rRNA EST (Gm-c1015-7201) is X15199.

Supplemental Data
The following materials are available in the online version of this article.