Gene Expression in Nitrogen-Fixing Symbiotic Nodule Cells in Medicago truncatula and Other Nodulating Plants

Gene expression analyses of nodules of the nitrogen-fixing clade of plants have advanced our understanding of nodule organogenesis and functioning and the evolution of symbiosis. Root nodules formed by plants of the nitrogen-fixing clade (NFC) are symbiotic organs that function in the maintenance and metabolic integration of large populations of nitrogen-fixing bacteria. These organs feature unique characteristics and processes, including their tissue organization, the presence of specific infection structures called infection threads, endocytotic uptake of bacteria, symbiotic cells carrying thousands of intracellular bacteria without signs of immune responses, and the integration of symbiont and host metabolism. The early stages of nodulation are governed by a few well-defined functions, which together constitute the common symbiosis-signaling pathway (CSSP). The CSSP activates a set of transcription factors (TFs) that orchestrate nodule organogenesis and infection. The later stages of nodule development require the activation of hundreds to thousands of genes, mostly expressed in symbiotic cells. Many of these genes are only active in symbiotic cells, reflecting the unique nature of nodules as plant structures. Although how the nodule-specific transcriptome is activated and connected to early CSSP-signaling is poorly understood, candidate TFs have been identified using transcriptomic approaches, and the importance of epigenetic and chromatin-based regulation has been demonstrated. We discuss how gene regulation analyses have advanced our understanding of nodule organogenesis, the functioning of symbiotic cells, and the evolution of symbiosis in the NFC.


INTRODUCTION
In response to nitrogen starvation and the presence of specific compatible nitrogen-fixing bacteria in the rhizosphere, plants of the nitrogen-fixing clade (NFC) in the Rosid clade form symbiotic organs on their roots known as nodules. Nodules are infected by a large population of these bacteria, which convert atmospheric nitrogen gas to ammonia. The ammonia is taken up by the plant to satisfy its nitrogen needs for growth. The NFC, which originated over 100 million years ago (MYA), is composed of four orders, the Fabales, Cucurbitales, Fagales, and Rosales (Soltis et al., 1995). Nodulation is widespread in the Leguminosae family of the Fabales (although not present in all legume species), while it is scattered among species of the three other orders. Leguminosae and species of the Parasponia genus of the Rosales are infected with phylogenetically diverse bacteria, collectively named rhizobia, belonging to the Rhizobiales of the alphaproteobacteria (a-rhizobia) or the Burkholderiales of the betaproteobacteria (b-rhizobia). The other nodulating genera, collectively known as actinorhizal plants, interact with nitrogen-fixing actinobacteria of the genus Frankia.
Although they are diverse, all rhizobia, as typical Gram-negative bacteria, have (broadly speaking) a similar type of cell envelope structure, which is a crucial attribute for effective nodule colonization (Haag et al., 2013). At least one other shared feature of rhizobiawith one reported exception (Giraud et al., 2007) is the capacity to produce so-called Nod factors, the bacterial signals that trigger the nodulation program of the host plant via the activation of the common symbiosis signaling pathway (CSSP).
Nod factors are lipochitooligosaccharides with a characteristic molecular structure and are similar only to the Myc factors produced by mycorrhizal fungi (Oldroyd, 2013;Cope et al., 2019). The capacity to produce Nod factors is encoded by the nodulation or nod genes located on symbiotic islands or on mobile plasmids in the rhizobial genomes (Poole et al., 2018).
By contrast, Frankia are Gram-positive bacteria with a fundamentally different cell envelope from rhizobia. Although very little is known about the requirements of cell envelopes for symbiosis in Frankia, these envelope differences likely have profound effects on the colonization of nodules. The nodule-inducing factors produced by Frankia strains are currently unknown (Cissoko et al., 2018), although some particular Frankia strains possess genes homologous to the nod genes of rhizobia (Persson et al., 2015). It is thus conceivable that Frankia strains produce nodule-inducing signals that are structurally homologous to Nod factors.
The nodule can be viewed as a plant organ that relies on the integration or recruitment of existing processes (meristem formation, endoreduplication, polar growth, endocytosis and exocytosis, nitrogen, and sugar metabolism) in a new context, in conjunction with the deployment of seemingly completely new features (infection threads and uptake of bacteria, meristem organization, formation of new organellar structures, maintenance of bacteria inside plant cells and immune suppression and metabolic integration of a bacterial endosymbiont). Thus, depending on whether one considers the glass half-empty or half-full, nodules can be viewed as "novel" or "not-so-novel" plant organs. The latter point of viewparticularly motivated by the finding that nodule formation is controlled by the CSSP, which is widely conserved in land plants that form symbiotic relationships with arbuscular mycorrhizal fungihas led to the suggestion that engineering nodulation and symbiotic nitrogen-fixation to currently nonnodulating plants would require a minimal set of genes and is therefore an attainable objective (Markmann and Parniske, 2009;Oldroyd and Dixon, 2014;Stokstad, 2016). However, nodule formation is accompanied by massive transcriptional reprogramming involving the activation and repression of hundreds of genes. Remarkably, this nodule transcriptional program is entirely different from the mycorrhizal transcriptional program controlled by the same CSSP. This highlights the uniqueness of nodules as plant organs and emphasizes the challenges faced when trying to transfer nodulation to nonlegume crops. Here we review what we have learned from transcriptome approaches aimed at characterizing the specific features of nodules, focusing mostly but not exclusively on the transcriptional reprogramming taking place during the formation of symbiotic nodule cells in the model legume, Medicago truncatula.

NODULE DEVELOPMENT AND MORPHOLOGY: NODULES ARE UNIQUE STRUCTURES
Nodules in different lineages of the NFC can differ from one another in terms of development, infection mechanisms, final structure, and metabolism. However, nodule organogenesis and functioning commonly calls upon several processes, which do not have equivalents in other plant developmental programs. Nodule formation in M. truncatula is among the best described ( Figure 1). This process is governed by a number of unique signaling cascades that use inter-specific (bacterium to plant and vice versa) and intra-specific (plant to plant) signaling molecules including flavonoids (plant to bacterium) and Nod factors (bacterium to plant; Oldroyd, 2013; Figure 1A), Type III, IV, and VI secretion system effectors (bacterium to plant; Nelson and Sadowsky, 2015;Teulet et al., 2019), bacterial surface polysaccharides (exopolysaccharides, lipopolysaccharides; bacterium to plant; Haag et al., 2013), NCR (nodule-specific cysteine-rich) peptides (plant to bacterium; Kondorosi et al., 2013), phytohormones (intraplant; Gamas et al., 2017;Schiessl et al., 2019), CLE and CEP peptides (intra-plant; Taleski et al., 2018), and small RNAs (intraplant and bacterium to plant; Tsikou et al., 2018;Ren et al., 2019). Each of these signaling systems seems to be variations on a theme of widely used regulatory mechanisms in plants. For example, similar to Myc factors, Nod factors activate the CSSP ( Figure 1A), which thus controls not only nodulation but also mycorrhization (Oldroyd, 2013). However, the output of the signaling in response to Nod factors and Myc factors is completely different. It is a longlasting conundrum how this differential output is achieved. Other key signals of nodulation are used in plant development (e.g. phytohormones) or in biotic interactions (e.g. immunity) but with a unique output in nodulation. Thus, nodulation relies upon a unique rewiring of signaling pathways.
The nodule primordium and the persistent meristem of indeterminate M. truncatula nodules have features reminiscent of the formation of lateral roots and the root apical meristem (Couzigou et al., 2012;Xiao et al., 2014;Franssen et al., 2015), including overlapping genetic and hormonal control mechanisms (Schiessl et al., 2019). However, the root apical meristem is organized in a different manner and leads to the formation of one central vascular bundle, while in legumes, the nodule meristem generates several peripheral vascular bundles ( Figure 1B).
Nevertheless, actinorhizal and Parasponia nodules have a central vasculature (Downie, 2014;Behm et al., 2014). The other cell types derived from the nodule meristem are also fundamentally different from root cells. Endoreduplication drives the differentiation of postmeristematic cells into symbiotic nodule cells, which become infected by rhizobium bacteria . Endoreduplication is common in plants and can be observed in differentiating root cells and other plant tissues in a species-specific manner, as well as in many other biotrophic interactions in plants (Edgar et al., 2014;Wildermuth et al., 2017). In nodules, endoreduplication is tightly coupled to infection and the passage of socalled infection threads (van Brussel et al., 1992;Yang et al., 1994;Suzaki et al., 2014). Determinate nodules, which form in some legume clades, differ fundamentally from the indeterminate nodules of M. truncatula, because they do not develop a meristem. Nevertheless, cells of determinate nodules also undergo endoreduplication after infection with rhizobia (González-Sama et al., 2006). Consequently, and contrary to meristem-containing indeterminate nodules, these nodules have a determinate growth, which is limited by early cell divisions during the primordium stage and the subsequent cell enlargement of rhizobium-infected cells driven by their endoreduplication. Well-studied examples of determinate nodules are formed by Lotus japonicus and soybean (Glycine max).
Infection threads are tubular structures containing infecting rhizobia that conduct these bacteria inside the nodule tissues ( Figure 1C). Infection thread formation initiates in growing root hairs due to the activation of the CSSP by rhizobial Nod factors. These structures share some similarities with pollen tubes and root hairs, which both display polar growth, as do infection threads. The polar growth of infection threads, pollen tubes, and root hairs indeed relies on common molecular mechanisms (Yang, 2008;Ke et al., 2012;Kiirika et al., 2012;Montiel et al., 2012;Morieri et al., 2013). However, contrary to pollen tubes and root hairs, which are autonomous cells, infection threads are trans-cellular structures that grow from cell to cell, hijacking the cellular machinery of their temporal host cells. Infection thread growth is a process also derived from the eukaryotic cell cycle (van Brussel et al., 1992;Yang et al., 1994;Timmers et al., 1999). Before penetration by an infection thread, the targeted vacuolated cortical root cell undergoes cytological reorganization, forming a socalled pre-infection thread or PIT ( Figure 1C; van Brussel et al., 1992;Timmers et al., 1999). The PIT is structurally related to the phragmosome, a cytoplasmic bridge through the vacuole formed by cells in preparation for mitosis. An analysis of the expression of cell cycle marker genes revealed that PIT-forming cells indeed enter into the cell cycle and duplicate their genomes but are arrested in the G2 phase, before the initiation of mitosis (Yang et al., 1994). More recently, it was shown that root hairs also prepare for infection thread formation by activating cell cycle genes (Breakspear et al., 2014). PITs are reminiscent of the prepenetration apparatus in root cells before infection with the hyphae of arbuscular mycorrhizal fungi (Genre et al., 2008).
Infection threads grow toward young, differentiating symbiotic cells where they release rhizobia through an endocytotic process ( Figure 1D). Endocytosis, a well-known process in plants (Paez Valencia et al., 2016), is essential for controlling the turnover of plasma membrane proteins such as receptors and transporters. (A) Nod factor activation of the CSSP. Brown, signaling in the rhizobium symbiont resulting in Nod factor production. Gray, CSSP in the host cells activates nodule organogenesis, infection, and autoregulation of nodulation.
However, the uptake of whole bacteria by plant cells as during rhizobium (or Frankia) infection of nodule cells is unknown, except for the case of the nitrogen-fixing symbiosis of the plant genus Gunnera with cyanobacteria. Nevertheless, the infection of nodule cells shares some similarities with the infection of plant cells by fungal pathogens and mycorrhizal symbionts (Harrison and Ivanov, 2017;Parniske, 2018). Endocytotic uptake does not release rhizobia freely in the cytosol of nodule cells, instead releasing them inside organelle-like vesicles known as symbiosomes, which have a plasmalemma-like membrane ( Figure 1D). Within symbiosomes, the rhizobia grow and differentiate into nitrogen-fixing structures called bacteroids. The latter process is enabled by the exocytosis pathway, which traffics specific proteins to the symbiosomes ( Figure 1D). The exocytosis pathway is strongly conserved in eukaryotes, but some components are nodule specific and are encoded by duplicated genes or by nodule-specific splice variants (Stonoha-Arther and . Among the unique features of nodules, the nitrogen-fixation process by symbiotic nodule cells is certainly the most prominent. This process requires the maintenance of thousands of bacterial endosymbiont-containing symbiosomes within individual plant cells ( Figure 1E). During this process, symbiosomes acquire a specific vesicular identity and avoid a vacuolar identity, which would lead to the elimination of the symbiont (Limpens et al., 2009;Sinharoy et al., 2013). In addition, symbiotic cells contain specific machinery that suppresses innate immunity responses against their many endosymbionts ( Figure 1D; Gourion et al., 2015;Wang et al., 2016;Domonkos et al., 2017;Yu et al., 2018). In certain legumes, such as Medicago spp in the inverted repeat-lacking clade (IRLC) and Aeschynomene spp in the Dalbergoid branch, symbiosome formation also involves the production of symbiosisspecific peptides called NCRs, which target the bacteroids and control their extreme, terminal differentiation (Mergaert, 2018). Finally, the functioning of symbiotic cells requires bacterial metabolism to be integrated with the metabolism of the host cells ( Figure 1F). Bacteroids generate ammonia in large excess to their own needs and release this ammonia to the plant. Major features of the metabolism of symbiotic cells include (1) the supply of carbon sources derived from photosynthesis in the shoots to the energy-devouring bacteroids, (2) the maintenance of a low oxygen concentration to avoid the irreversible oxidation and inactivation of the nitrogenase enzyme, and (3) the assimilation of ammonium provided by nitrogen-fixing bacteroids. This specific metabolism requires the activation of a suite of enzymes and transporters and the massive production of leghemoglobin to bind oxygen (Figures 1D and 1F;Udvardi and Poole 2013;Poole et al., 2018;Roy et al., 2019).
Transcriptomic approaches to study nodule formation have been extensively applied in M. truncatula, in large part due to the development of whole genome Affymetrix Medicago GeneChips. These GeneChips have been used in 274 different experimental conditions, which can be found in the Medicago truncatula Gene Expression Atlas (MtGEA) repository (https://mtgea.noble.org/v3/ index.php; Benedito et al., 2008). MtGEA compiles a broad range of developmental and environmental conditions, including different plant organs and tissue types, biotic-, abiotic stress (B) Organization of nodule tissues. Tissues are color-coded. The dividing meristem cells (yellow) give rise at the edge to several vascular bundles (blue) concentrically organized around the central nodule tissue. The meristem cells (2C to 4C DNA content) also give rise to the central nodule tissue, which is mainly composed of infected symbiotic cells formed from postmeristematic cells by successive rounds of endoreduplication (shades of gray from light to dark indicate cells with increasing ploidy from 8C to 64C). Infection threads (green) deliver rhizobia to early differentiating symbiotic cells. (C) Infection threads, filled with rhizobium bacteria, are initiated in a root hair and have polar growth (red). The underlying cortical cells are activated and initiate a cell cycle. Outer cortical cells, adjacent to an infection thread-carrying cell, do not complete the cell cycle but form a cytoplasmic bridge (related to a phragmosome) that will be penetrated by the growing infection thread, which is referred to as a preinfection thread. Inner cortical cells complete the cell cycle and the newly generated cells constitute the nodule primordium. Drawings are based on van Brussel et al., 1992. (D) An infection thread reaching a target cell in a nodule primordium or growing nodule releases rhizobia by endocytosis. The exocytosis pathway of the host cell targets proteins and peptides to bacteria located in vesicles called symbiosomes, resulting in the differentiation of these bacteria into nitrogen-fixing bacteroids. PTI is actively repressed in symbiotic cells. Specific expression in symbiotic cells is required for the production of symbiosome-targeted secretory proteins (green and orange), for immune suppression functions (yellow), and for metabolism [purple, see (F)]. (E) A fully mature symbiotic cell is entirely filled with bacteroid-containing symbiosomes and has a central vacuole and a polyploid nucleus (blue). (F) Integration of the bacteroid metabolism (brown) with symbiotic cell metabolism (gray) relies on complementarity of the metabolic functions in both symbiotic partners and a complex exchange of small molecules mediated by a multitude of transporters. Some metabolic steps require a third partner, the plastid (green). The drawing is based on Temple et al., 1998, Udvardi and Poole, 2013, and Poole et al., 2018 conditions, and hormone treatments. It is therefore currently the richest M. truncatula resource for the analysis of expression patterns and specificity of genes of interest and the discovery of genes expressed under certain conditions or in organs of interest, even if a limitation is the absence in the database of a subset of the currently annotated M. truncatula genes, in particular genes encoding peptides and noncoding RNAs. Ordering the MtGEA data set with a clustering algorithm results in the identification of tissuespecific gene sets ( Figure 2A). The heat map discloses large clusters of genes that are expressed in different plant organs, leaves, stems, flowers, seeds, hypocotyls, roots, mycorrhiza, and nodules. This analysis also revealed that nodules and mycorrhiza are primarily typified by the expression of different sets of genes ( Figure 2B), despite the shared CSSP module of regulators that controls both nodule and mycorrhiza formation and a number of common target genes that are activated immediately downstream of the CSSP during both types of interactions (Hogekamp et al., 2011). For research on rhizobium-legume symbiosis, the MtGEA data set can be complemented by expression data derived from laser-capture microdissection (LCM)-transcriptome analysis available in the SYMBIMICS database (https://iant.toulouse. inra.fr/symbimics/; Roux et al., 2014). Together, the MtGEA and (A) Hierarchical clustering of the MtGEA data set (Benedito et al., 2008). Genes are shown in rows and experiments in columns. On the top, the experiment types are indicated. To the left, gene clusters are labeled based on organ-specific expression. In the expression heatmap, yellow indicates high expression and black indicates low expression. In the "NCR" column, the NCR genes are indicated by a yellow line. In the "Shannon" columns, yellow indicates high specificity (i.e. low Shannon entropy) and black indicates low specificity, with more uniform expression (i.e. high Shannon entropy). (B) An enlargement of the red squared area of the heatmap in panel A showing nodule-specific (nod) and mycorrhizal-specific (myc) genes (lines) and experiments (columns). (C) The SYMBIMICS data set (Roux et al., 2014) for nodule-specific genes from MtGEA [gene cluster labeled in red in (A)]. The columns correspond to the nodule tissues: I, meristem; IId, distal infection zone or nodule zone II; IIp, proximal infection zone; IZ, interzone between nodule zones II and III; III, fixation zone or nodule zone III. Color-codes as in (A). The data sets used to generate the heatmaps are available upon request. SYMBIMICS databases are priceless resources for studying M. truncatula biology, particularly nodulation.

NOVEL INSIGHTS FROM NODULE TRANSCRIPTOME ANALYSES
Because nodules form de novo on roots starting from fully differentiated cortical, endodermal, pericycle, and epidermal cells, the nodule transcriptome is usually defined by the genes showing differential expression in nodules versus roots. A recurring theme in each transcriptome study on legumes, Parasponia, and actinorhizal plants is that nodule formation is accompanied by the differential expression of hundreds to thousands of genes. These sets of nodule-enhanced genes are composed in part of genes that are more strongly expressed in nodules than in other organs, but importantly, these gene sets also contain large numbers of absolutely nodule-specific genes. Conversely, a transcriptome analysis in M. truncatula revealed that the larger part of the gene expression changes during lateral root initiation also occur during nodule organogenesis (Schiessl et al., 2019). These common differentially expressed genes are related to auxin biosynthesis and signaling and transcriptional regulation. This observation suggests that lateral root organogenesis has been recruited into the symbiotic program to which a large panel of nodule-specific features was added. The nodule-specific gene sets are composed of genes with homologs in other plants or paralogs expressed in other plant tissues as well as orphan genes. These findings further underscore the unique nature of nodules and the very complex genetic program governing the formation of this symbiotic organ. Thus, from this perspective, the glass might be half-empty.

The Specificity of the Nodule Transcriptome
The analysis presented in Figure 2 suggests that some organs, particularly nodules, are associated with the expression of genes that are extraordinarily specifically expressed. This is obvious when zooming in on the expression of NCR genes ( Figure 3A; Guefrachi et al., 2014). This extremely large gene family encodes nodule-specific cysteine-rich peptides, which target endosymbionts and induce them into a terminally differentiated state, are required for maintaining bacteroids, or are involved in the selection of well-adapted rhizobia in nodules (Mergaert, 2018). All but five of the over 300 analyzed NCR genes are exclusively expressed in nodules and in no other plant organ or in response to any other biotic interaction or abiotic stress. To express the apparent tissue specificity of genes expressed in nodules and of the NCRs in a quantitative manner, a metric called Shannon entropy was used . Shannon entropy is a parameter derived from information theory that is applied to gene expression analysis to characterize the uniformity (high Shannon entropy) or specificity (low Shannon entropy) of the expression pattern of a gene of interest over a number of tested conditions. Taking into account ten different tissues or organs of M. truncatula, the Shannon entropy was calculated for all genes in MtGEA. Ordering the MtGEA genes according to expression profile ( Figure 3B) or ranking them according to their Shannon entropy value ( Figure 3C) makes it clear that flowers, seeds, pods, and mycorrhiza and most particularly nodules are characterized by large gene sets with very low Shannon entropy that are thus (almost) uniquely expressed in these organs. Thus, nodulation in M. truncatula strongly depends on dedicated gene sets that do not function elsewhere in the plant . As discussed below, the large majority of these genes are expressed in differentiating or mature symbiotic cells ( Figure 3C).
Shannon entropy analysis of the AGED, PvGEA, LjGEA, and SoyBase data sets ( Figure 3D) confirmed the notion that nodule formation in M. sativa, P. vulgaris, L. japonicus, and soybean is also associated with the expression of large numbers of low Shannon entropy genes that are (nearly) uniquely expressed in that organ. Soybean/P. vulgaris, L. japonicus, and M. truncatula/M. sativa belong to different clades of legumes: the Milletioids, Robinioids, and IRLC, respectively. Thus, the reliance on large gene sets that are uniquely expressed in the nodule is not restricted to Medicago spp and is not a specific feature of the particular nodule types formed by these species, which are of the indeterminate type and induce terminal bacteroid differentiation. Both of these features are not present in bean, soybean, or L. japonicus nodules. However, the nodule-specific gene sets in Medicago spp appear to be significantly larger than those in the other legumes ( Figure 3). This difference is largely due to the extremely high number of Medicago genes encoding NCR peptides . In M. truncatula, the NCR gene family is composed of ;600 genes and in tetraploid M. sativa, the number of NCR genes increases to 1300 (O'Rourke et al., 2015), whereas these genes are absent in the three other legumes.
Besides the NCR gene family, there are many other nodulespecific, low entropy genes. Among these are well-studied genes, some with essential functions in symbiosis (e.g. genes encoding leghemoglobin, SNARPs [small nodulin acidic RNA binding protein, also known as LEED..PEEDs], the phospholipase C-like DNF2 (defective in nitrogen fixation), the cysteine-rich receptorlike kinase SymCRK, metabolite transporters, and so on) and many genes with only predicted or uncharacterized functions (e.g. putative transporter encoding genes, glycine-rich proteins [GRPs], early nodulins ENOD8 and ENOD16, calmodulin-like proteins, and so on), or genes of unknown function. Importantly, the nodule-specific transcriptome contains several families of orphan genes or taxonomically restricted genes (genes without homologs in other lineages), including the NCRs, SNARPs, and GRPs, which are restricted to the IRLC legumes (Kevei et al., 2002; Genes are shown in rows and tissue types in columns. The Shannon entropy of the genes (calculated based on the expression profile) is displayed to the right of each gene expression heatmap. The nodule-specific gene clusters mostly comprising low Shannon entropy genes are highlighted by a red rectangle. Color codes: in the expression heatmaps, yellow indicates high expression and black indicates low expression; in the "NCR" columns, the NCR genes are indicated by a yellow line; in the "Shannon" columns, yellow indicates high specific expression (i.e. low Shannon entropy) and black indicates low specificity, more uniform expression (i.e. high Shannon entropy). The data sets used to generate the heatmaps are available upon request. (C) Expression patterns of M. truncatula genes ordered according to specificity (Shannon entropy). Mergaert et al., 2003;Laporte et al., 2010;Trujillo et al., 2014), the NCR genes of Aeschynomene spp (Czernic et al., 2015), the NAD1 (nodules with activated defense 1) genes conserved only in plants from the NFC , and the nodulin 20 family, first identified in soybean (Richter et al., 1991), which is specific for the Phaseoleae (Trujillo et al., 2019). Such genes could have arisen from the duplication and rearrangement of ancestral genes followed by rapid diversification. Alternatively, orphan genes could have evolved de novo from noncoding genomic regions believed to be used as raw materials for the evolution of new genes (Tautz and Domazet-Lo so 2011).
A number of low-entropy nodule-specific genes encode proteins with unique combinations of existing domains, likely resulting from gene recombination. For example, the symbiosisspecific thioredoxin Trx s, Calmodulin-like protein, and ENOD8 families evolved from fusions of known protein domains with an N-terminal signal peptide, resulting in unique secretory proteins targeted to symbiosomes and bacteroids. There are a remarkably high number of low-entropy genes with signal peptides, suggesting that symbiotic cells target many proteins and peptides to the symbiosomes ( Figure 1D; e.g. NCRs, GRPs, SNARPs, calmodulin-like proteins, ENOD8, ENOD16, ENOD20, DNF2, SymCRK, Trx s, nodulin 25, MtN22, MtN11, lipase/lipoxygenases [also known as nodule-specific PLAT domain proteins or NPDs], basic blue-like protein, NCR-like peptides, MBP-1, and CAP antimicrobial peptides in the Dalbergioid lineage, nodulin 20 family proteins of the Phaseoleae, APN1 of L. japonicus, and so on; Dickstein et al., 2002;Liu et al., 2006;Alunni et al., 2007;Laporte et al., 2010;Maunoury, 2010;Trujillo et al., 2014;Yamaya-Ito et al., 2018;Pecrix et al., 2018;Pislariu et al., 2019;Trujillo et al., 2019). The high level of protein secretion to symbiosomes and bacteroids in symbiotic cells of M. truncatula, particularly in developing and young mature cells, is further supported by the strongly developed endoplasmic reticulum in these cells (Maunoury et al., 2010) and by the requirement of symbiotic cell-specific regulators of the secretory pathway for normal bacteroid differentiation and nodule development (reviewed in Stonoha-Arther and . Other genes resulted from duplications and modifications with neofunctionalization, such as leghemoglobin genes. Oxygen binding leghemoglobins are essential for protecting the oxygensensitive enzyme nitrogenase. Their recruitment for symbiosis required mutations (replacement of a few amino acids) of nonoxygen-transporting hemoglobins present in all plants to create an oxygen-transporting hemoglobin, as well as nodulespecific expression of the corresponding genes. Other genes also resulted from duplication, whereby one paralog obtained a new, nodule-specific expression profile but without a new function. Genes encoding subunits of a nodule-specific signal peptidase complex, including Dnf1, fall in this category (Wang et al., 2010). This endoplasmic reticulum-localized enzyme complex of the secretory pathway is required for removing the signal peptide during trafficking of secretory proteins. The NPDs in the IRLCs and the MBP-1 and CAP peptides in the Dalbergioids are encoded by multigene families including nodule-specific subsets as well as subsets with constitutive expression (Trujillo et al., 2019). Other examples are the many small metabolite transporter genes that are essential for the formation of functional nodules ( Figure 1F; Sugiyama et al., 2017;Frare et al., 2018;Senovilla et al., 2018;Chen et al., 2019). These genes belong to small multigene families, with one copy having nodule-specific or nodule-enhanced expression. The encoded transporters assure the transport of metabolites to and from the symbiotic cells and bacteroids and are essential for the metabolic integration of bacteroids with symbiotic nodule cells. A similar situation holds true for the many genes encoding the enzymes involved in the specific carbon and nitrogen metabolism of symbiotic cells ( Figure 1F). Carbon metabolism provides the carbon source to bacteroids in the form of dicarboxylic acids and synthesizes the carbon backbone for assimilation of the ammonium derived from nitrogen fixation in bacteroids. Nitrogen metabolism assures that this nitrogen is assimilated into amino acids or in some legumes into ureides. A large body of evidence indicates that the key enzymes of carbon and nitrogen metabolism in symbiotic cells are encoded by nodule-specific genes. Like small-metabolite transporters, metabolic genes are also members of small gene families, with one copy having a nodule-specific or -enhanced expression (e.g. Miller et al., 1998;Hakoyama et al., 2009;Nouwen et al., 2017).
The large number of rhizobia in symbiotic cells implies the massive presence of PAMPs (pathogen-associated molecular patterns) and the potential to induce strong PAMP triggered immunity (PTI). Several symbiotic cell-specific proteins are required to suppress the PTI response and to enable the long-term maintenance of rhizobia ( Figure 1D). In the absence of the receptor-like kinase SymCRK, the phospholipase C-like DNF2, the PLAT Domain protein NPD1, the transcription factor RSD, or the endoplasmic reticulum located small protein NAD1, a strong immune response is induced, as evidenced by the induction of immunity marker genes, nodule necrosis, and a complete halting of symbiosis (Bourcy et al., 2013;Sinharoy et al., 2013;Berrabah et al., 2014;Wang et al., 2016;Domonkos et al., 2017;Pislariu et al., 2019). The genes encoding these immune repressors are low entropy genes that are only expressed in symbiotic nodule cells. SymCRK, DNF2, NPD1, and RSD are nodule-specific paralogs, whereas NAD1 is unique and appears to be specific to the NFC , suggesting that it was present in the ancestor of the NFC.

Examples of Functional Insights from Nodule Transcriptomics
As mentioned above, nodule formation has been studied by transcriptomics in many legumes and nonlegume species, and several studies have focused on specific aspects of the symbiotic process in these NFC plants (e.g. Høgslund et al., 2009;Libault et al., 2009a;Libault et al., 2010;Yuan et al., 2017;van Velzen et al., 2018). Undoublty, transcriptomics have been used most extensively in M. truncatula to study nodule development, from very early responses induced by Nod factors and gene activation in root hairs (Combier et al., 2008a;Czaja et al., 2012;Breakspear et al., 2014;Larrainzar et al., 2015;van Zeijl et al., 2015;Jardinaud et al., 2016) to the development of mature nodules (El Yahyaoui et al., 2004;Maunoury et al., 2010;Moreau et al., 2011). Analyses focusing on the early responses to Nod factors and rhizobia have used clever tricks to overcome the limited sensitivity in detecting transcriptional responses in the small number of Nod factor-or rhizobium-responsive cells in roots. These solutions included preparing RNA from the roots of supernodulating plant mutants, which have a strongly enhanced nodulation response (Combier et al., 2008a;Larrainzar et al., 2015). Alternatively, analysis has been performed using as a starting material purified root hairs, the first target cells of Nod factors and rhizobia, obtained by breaking them off from roots frozen in liquid nitrogen (Breakspear et al., 2014;Damiani et al., 2016;Liu et al., 2019a) or root epidermal cells obtained by LCM . Analysis of the early symbiotic signaling responses highlighted the importance of cytokinin, auxin, ethylene, strigolactone, abscisic acid, gibberellic acid, and brassinosteroid phytohormone signaling pathways. This observation suggests that an important rewiring of the interconnected phytohormone pathways is a key step in the initiation of nodule organogenesis (Czaja et al., 2012;Breakspear et al., 2014;Larrainzar et al., 2015;van Zeijl et al., 2015;Jardinaud et al., 2016;reviewed in Gamas et al., 2017;Buhian and Bensmihen, 2018;Liu et al., 2018;Liu et al., 2019a). These studies confirmed the activation of cell cycle-related processes early in the nodulation signaling pathway (Breakspear et al., 2014;Larrainzar et al., 2015;Jardinaud et al., 2016), which is in agreement with the known role of the cell cycle in the establishment of a nodule primordium (Foucher and Kondorosi, 2000), cell differentiation (Vinardell et al., 2003;Suzaki et al., 2014), and infection (van Brussel et al., 1992;Yang et al., 1994;Suzaki et al., 2014). Another striking finding is the identification of the gene encoding the rapid alkanization factor MtRALF1 peptide as a Nod factor-induced gene controlling infection thread development (Combier et al., 2008a). Peptides of this family control cell expansion in plants, notably during root hair and pollen tube growth (Haruta et al., 2014;Murphy and De Smet, 2014;Ge et al., 2017;Mecchia et al., 2017), the two plant cell types that show the highest similarity to the infection thread structure. Notably, these RALF peptides interact with receptor-like kinases containing a so-called malectin-like domain within their ectodomain (Franck et al., 2018). Such a domain is also present and plays a key role in the ectodomain of the receptor-like kinase SYMRK/ DMI2/NORK (Antolín-Llovera et al., 2014;Pan et al., 2018), a component of the CSSP, suggesting that MtRALF1 targets the Nod factor receptor complex, a hypothesis that might be worth pursuing.
Using time series of wild-type nodule growth combined with collections of both rhizobial and plant mutants arrested at different stages of nodule development, transcriptome dynamics during nodule organogenesis was dissected into distinct transcriptional waves composed of the induction and repression of genes that are consecutively initiated during organogenesis (Starker et al., 2006;Maunoury et al., 2010;Moreau et al., 2011;Lang and Long, 2015). Four major waves of gene activation were uncovered using these approaches. The first wave is associated with early signaling and bacterial infections, the second with symbiotic cell differentiation, the third with the differentiation of bacteroids, and the fourth with the activation of nitrogen fixation.
In addition to studies that focused on the overall transcriptome dynamics associated with nodule development, other studies have used transcriptomics to address more specific questions. The transcriptomes in nonfunctional nodules formed by M. truncatula mutants in transcriptional regulators were used to characterize the stages of nodule development affected by the mutations at the molecular level and to delineate the sets of potential target genes of these transcriptional regulators (see below; Vernié et al., 2008;Satgé et al., 2016;Sinharoy et al., 2013;Deng et al., 2019). Another type of study used transcriptomics to compare the nutritional status of whole plants in response to symbiotic nitrogen fixation or mineral nitrogen as nitrogen source (Ruffel et al., 2008) or to investigate specific physiological conditions in nodules. During nodule senescence, nodule cells break down and their macromolecular components are degraded and remobilized to other plant organs (Van de Velde et al., 2006). The process can be developmentally induced by the aging of nodules or when the host plants need to shift their nutrient resources for seed production. Nodule senescence can also be induced by particular physiological conditions, including reduced photosynthesis, nutrient starvation, or a shift in the nitrogen status of the plant. Nodule senescence under these different conditions has been studied intensively by transcriptomics ( Van de Velde et al., 2006;Seabra et al., 2012;Cabeza et al., 2014aCabeza et al., , 2014bLiese et al., 2017;Deng et al., 2019), which has led to the identification of marker genes and actors in the process. Cys proteinases appear to be key players in nodule senescence: manipulating their expression causes delayed or premature senescence, thus affecting nitrogen fixation (Pierre et al., 2014;Deng et al., 2019).
Other studies have used transcriptomics and dedicated Affymetrix gene chips to focus on the expression of NCR and other defensin-like peptide genes in M. truncatula (Nallu et al., 2013;Tesfaye et al., 2013). The results confirmed the notion that NCRs are nodule-specific peptides and revealed a distinct subset of defensin-like genes unrelated to the NCRs that are specifically expressed in seeds. Like M. truncatula, many defensin-like peptides are also expressed in both the inflorescences and seeds of Arabidopsis (Arabidopsis thaliana). The peptides in these organs may play roles in defense. Indeed, the abundant expression of defensin-like peptides may reflect a specific, strongly protected immune status of reproductive organs. In an RNA-seq study focusing on small, secreted peptides that act via receptormediated signaling, 365 peptides belonging to 29 different putative functional peptide families were identified whose expression changed during nodule formation, suggesting that peptide signaling plays a prominent role in the coordination of nodulation (de Bang et al., 2017). This data set produced in this study represents a rich resource for future work aiming to decipher the functions of these peptides.
The genotypes of both the Rhizobium and legume influence the outcome of the interaction and the efficiency of the nitrogenfixation process (Sugawara et al., 2013;Kazmierczak et al., 2017). Transcriptomics were used to gain insights in the molecular basis of these genotype-by-genotype variations in M. truncatula-Sinorhizobium symbiosis (Heath et al., 2012;Nallu et al., 2014;Burghardt et al., 2017). These studies revealed a high degree of genotype-by-genotype variability in the transcriptomes that was more dependent on the host genotype than the bacterial symbiont genotype. In particular, NCR expression is highly variable, suggesting that the NCRs and the differentiation of the bacteroids induced by NCRs are important determinants of the efficiency of symbiosis (Kazmierczak et al., 2017). In agreement with this finding, the orthologous NCR genes within a collection of M. truncatula accessions showed a high degree of nucleotide variation with signatures of diversifying selection pressure, suggesting that NCR genes are rapidly and recently evolving, possibly to help plants adapt to diversifying rhizobium symbionts (Branca et al., 2011;Nallu et al., 2014;Zhou et al., 2017).
Many transcriptome studies have been performed using whole nodules and did not provide information on the cell type-specific expression of nodule-enhanced or nodule-specific genes. Indeterminate M. truncatula nodules are complex organs composed of many cell types. Differentially expressed genes could be specifically expressed in one or several of these cell types. However, some studies have provided spatial resolution at the level of cell type or nodule tissue (Maunoury et al., 2010;Limpens et al., 2013;Roux et al., 2014). These studies demonstrated that the large majority of nodule-enhanced or nodule-specific genes are expressed in differentiating or mature symbiotic cells. By analyzing different stages of nodule development in conjunction with a large panel of symbiotic mutants arrested in nodule development, two major transcriptome switches (regulating a large proportion of nodule-enhanced or -specific genes) were revealed, one that correlated with symbiotic cell maturation and another one with bacteroid differentiation (Maunoury et al., 2010). An impressive improvement in transcriptome analysis was achieved by combining LCM of nodule tissues with Affymetrix gene chips (Limpens et al., 2013) or RNA-seq analysis (Roux et al., 2014;Jardinaud et al., 2016), providing spatial resolution to global gene expression analysis in nodules (Figure 4). The dissection of nodules by LCM-RNA-seq confirmed the conclusion that a large majority (over 90%) of the 1200 nodule-enhanced or -specific genes are expressed in differentiating or mature symbiotic cells ( Figure 3C; Roux et al., 2014). This observation reinforces the notion that symbiotic nodule cells are unique cell types that have complex needs for their formation and maintenance and for the metabolic integration of endosymbionts.
New technologies could in the (near) future further improve the spatial resolution to the level of nodule tissue subdomains or even single cells, which would resolve nonuniform responses that are averaged over the population of cells in multi-cell samples or unmask unique biological properties of single cells that are too diluted in populations to be discovered in multi-cell samples. These new methods include single cell-type or single cell transcriptome analysis. Single cell types or single cells or the corresponding nuclei could be obtained through LCM, through isolating nuclei tagged in specific cell types (INTACT), or through flow cytometry and sorting of nuclei based on genome size (Figure 4; Libault, 2018). The latter technique could also be combined with fluorescent labeling of nuclei in specific cells in transgenic plant lines with cell type-specific promoters driving the expression of a fluorescent marker. Other new technologies, including spatial transcriptomics and Slide-seq, that could be applied to nodule gene expression analysis, are based on the transfer of RNA from tissue sections onto a surface coated with spatially structured molecular barcodes that are sequenced together with the transferred RNA, generating spatially resolved RNA-seq transcriptome data (Figure 4; Ståhl et al., 2016;Rodriques et al., 2019).

Genome Organization of Nodule-Specific Genes
Nodule-specific genes often form small, intermediate, or even very large gene families, as observed for the NCR genes in many IRLC legumes (Montiel et al., 2017). After their birth, the NCR genes in these plants must have spread massively through the genome by repetitive gene duplications. NCR genes appear in small clusters, which are spread across the genome in M. truncatula (Graham et al., 2004;Alunni et al., 2007;Zhou et al., 2017), indicating that long-distance duplications followed by local duplications must have played a role in the spreading of NCR genes. A welldocumented mechanism for generating gene duplications is based on transposon-mediated nonhomologous recombination and retrotranscription (Galindo-González et al., 2017). Many NCR genes in M. truncatula are in the vicinity of transposable elements, indicating transposons were (and possibly still are) involved in the explosion of the NCR gene family in the M. truncatula genome (Graham et al., 2004;Satgé et al., 2016).
Besides NCR genes, other nodule-specific genes in M. truncatula, particularly those expressed in differentiating and mature symbiotic cells, also display local duplications, including genes encoding GRPs, SNARPs, NPDs, calmodulin-like proteins, ENOD8, leghemoglobins, the ammonium channel NOD26, and others (Dickstein et al., 2002;Liu et al., 2006;Alunni et al., 2007;Trujillo et al., 2014;Frare et al., 2018;Pecrix et al., 2018;Trujillo et al., 2019). These clusters are distributed all over the M. truncatula genome and strikingly, in addition to locally duplicated genes, they often contain different types of nodule-specific genes (Liu et al., 2006;Pecrix et al., 2018). These genome clusters of nodule-specific genes are called symbiosis-related islands (SRIs). Almost 300 SRIs were identified in the M. truncatula genome, which on average contain more than seven nodule-enhanced genes and more than five distinct gene families. Besides the above-mentioned genes, several other well-known, important symbiotic genes are located within SRIs ( ). This clustered gene organization is thought to contribute to the nodulespecific regulation of SRI-localized genes (Pecrix et al., 2018).

The Evolution of Nodulation
To date, no systematic meta-analysis has been performed that compares the details of the transcriptomes of all analyzed NFC plants to identify (for example) the common nodule-enhanced genes. The upstream regulatory hierarchy that orchestrates the genetic program underlying nodulation in the host, the CSSP, is functionally conserved between legumes, Parasponia, and actinorhizal plants (Gherbi et al., 2008;Markmann et al., 2008;Svistoonoff et al., 2013;Clavijo et al. 2015;Granqvist et al., 2015;Battenberg et al. 2018;van Velzen et al., 2018;van Zeijl et al., 2018). However, the extent of conservation of the other actors in nodulation downstream of these master regulators is currently unknown. Nevertheless, analyses have recently started to address this question by defining orthologous groups between NFC plant species and identifying common nodule-enhanced genes. One study compared root and nodule transcriptomes in the actinorhizal plants C. thyrsiflorus and D. glomerata and the legume M. truncatula and identified a set of only 51 genes commonly induced in the nodules of these three NFC plants . A comparison focusing on the low-entropy nodule-specific gene sets in soybean, L. japonicus, P. vulgaris, and M. truncatula found only 12 common orthologous genes between the four species and 34 to 70 common genes in pairwise comparisons (Wu et al., 2019). These results suggest that nodule-specific gene sets appeared late in evolution and independently in the legume clades. However, another study comparing the P. andersoniinodule transcriptome obtained by RNA-seq with comparable M. truncatula data found a more substantial overlap . Among the 1719 and 2753 nodule-enhanced genes in P. andersonii and M. truncatula, respectively (Roux et al., 2014; genes that are upregulated at least twofold in young mature nodules vs. uninfected roots), 290 putative orthologs were found. This common set of genes activated in P. andersonii and M. truncatula nodules includes genes encoding several known symbiosis regulators, signaling proteins, and TFs, as well as metabolic proteins such as enzymes and metabolite transporters. The common genes were further classified into two functional categories based on their expression profiles in nitrogen-fixing P.
andersonii nodules and in uninfected nonfunctional nodules. Of the 290 common genes, 126 were upregulated in both functional and nonfunctional nodules, suggesting they play roles related to nodule organogenesis. The 164 remaining genes were only induced in infected and functional nodules and are therefore associated with infection or nitrogen fixation (van Velzen et al., 2018).
At first glance, the list of 290 common genes between these two nodule transcriptomes appears to be relatively small in light of the 1719 and 2753 nodule-enhanced genes in these species, suggesting that the functions of these nodules primarily involve different gene sets. However, the two plants diverged ;100 MYA, which may have brought about this functional divergence. Moreover, this comparison has to be placed in the context of the evolution of nodulation and symbiotic nitrogen fixation. Two opposing evolutionary models for nodulation have been proposed ( Figures 5A and 5B). Until recently, the prevailing hypothesis (based on phylogenetic ancestral state reconstruction and the strong differences in nodules of different lineages) considered the (A) Purified nodule cell types can be obtained by LCM. This technology uses a laser guided by a microscope to cut cells in a precise, contamination-free manner out of their surrounding tissues from an organ section. (B) A second methodology exploits the varying DNA content in the nuclei of differentiating symbiotic nodule cells and separates these nuclei using flow cytometry and a cell sorter. Flow-cytometry is a laser-assisted technology that analyzes the biophysical features of small particles, such as cells or nuclei, in a capillary stream. The graph on the right shows a typical flow cytometry measurement of the DNA content in nodule nuclei. A cell sorter coupled to the flow cytometer separates these nuclei. Both methods, LCM and flow cytometry, yield purified materials that are compatible with downstream molecular biology methods such as gene expression analysis (RNA-seq, microarray hybridization, RT-quantitative PCR) and chromatin analysis (DNA methylation analysis, chromatin immunoprecipitation-seq). Both methods can be used to isolate single cells or single cell types from nodules and can also be combined with the use of fluorescent markers expressed in cells of interest. (C) In spatial transcriptomics or Slide-seq, a nodule section is placed on a support that carries a defined array of barcodes. After the RNA is transferred from the tissue to the array, the barcode sequence is introduced during cDNA synthesis, and RNA-seq libraries are produced from the whole sample, preserving the spatial information through barcodes. (A) The convergent evolution hypothesis proposes the evolution of a cryptic innovation or predisposition for nodulation (red asterisk) followed by independent acquisitions of nodule formation (indicated by colored dots on tree branches). Green and blue dots signify the acquisition of nodulation with Frankia and Rhizobium, respectively. evolution of nodulation to be a two-step process. The initial step was the acquisition of an uncharacterized predisposition for nodulation by the ancestor of the NFC over 100 MYA, which was followed (at least 30 MY later) by the independent acquisition or convergent evolution of nodulation in six to ten different lineages of the NFC ( Figure 5A; Soltis et al., 1995;Werner et al., 2014;Doyle, 2016). This hypothesis implies multiple independent recruitments for nodulation of the CSSP, that preexisted and functioned in mycrorrhization in these lineages, an evolutionary scenario known as deep homology that has repeatedly been identified for animal structures (e.g. eyes and limbs; Shubin et al., 2009). Under this scenario, nodulation evolved independently in Medicago and Parasponia, implying that the 290 common genes in P. andersonii and M. truncatula nodules arose through complex convergent evolution.
Therefore, van Velzen et al. proposed the challenging hypothesis that nodulation arose already in the common ancestor of the NFC (van Velzen et al., , 2019. Besides the substantial overlap in the nodule transcriptomes of P. andersonii and M. truncatula, including 126 putative organogenesis genes, this new hypothesis is also based on the independent and recent losses or pseudogenizations of the essential symbiosis genes NFP, NIN, and RPG in nonnodulating plants of multiple lineages of the NFC (Griesmann et al., 2018;van Velzen et al., 2018). Together, these observations can be interpreted as a single gain of nodulation at the origin of the NFC and subsequent multiple losses in the current nonnodulating lineages ( Figure 5B). Since the divergence of the four orders of the NFC occurred very rapidly (;104 MYA) after the divergence of the NFC itself from its sister clades (;110 MYA), a period of only a few MYs (Wang et al., 2009;Li et al., 2015), this hypothesis implies that nodule organogenesis evolved in a very short time, within these few MYs. Alternatively, perhaps nodulation and nitrogen-fixing symbiosis are much more ancient than previously thought and have been lost altogether in the NFC sister clades.
A possible evolutionary scenario leading to nodule formation is that the existing interaction with mycorrhiza in the ancestor for nodulation was extended to a bacterial nitrogen-fixing symbiont, resulting in the infection of root cortical cells with bacteria rather than fungi. This interaction used the already present CSSP and recruited additional genes. One of the new key functions could have been the suppression of immunity against bacteria in infected root cells. NAD1, a conserved protein specific to the NFC, might represent an early recruited immunity suppressor . Perhaps the ammonium channel NOD26, which is involved in the uptake of ammonium produced by bacteroids from the symbiosomes into the symbiotic cells, represents another early innovation (Frare et al., 2018). This transporter, along with ammonium assimilation enzymes, is required for the metabolic integration of the nitrogen fixing symbiont. At the regulatory level, the TF NIN, which is required for nodule organogenesis and infection in present-day plants, might have been recruited early on by the CSSP from a preexisting TF family of NIN-like proteins (NLPs) implicated in the regulation of nitrogen homeostasis in plants (Soyano and Hayashi, 2014;Clavijo et al., 2015). Another innovation must have been the activation of the cell cycle in differentiated root cells to increase the number of infected cells by cell proliferation and to direct organogenesis. This step could have been accomplished through the recruitment of cytokinin signaling by NIN, a crucial step in the initiation of root cortical cell division in legumes (Schauser et al., 1999;Vernié et al., 2015;Gamas et al., 2017), although the role of cytokinin signaling in nodulation has not been confirmed in the other clades of the NFC (van Zeijl et al., 2018;Gauthier-Coles et al., 2019). Subsequently, the evolution of nodulation may have occurred via a gradual process that created an ever increasing mass of infected symbiotic cells, providing more and more nitrogen to the host until the full nitrogen needs of the plant were met ( Figure 5B).
The new model of van Velzen et al. (2019) further proposes that symbiosis with Frankia occurred at the NFC origin and subsequent independent switches to rhizobia occurred in the ancestor of legumes and Parasponia. This proposition is based on the finding that legumes and Parasponia use paralogous and not homologous hemoglobin genes in nodules to protect the oxygensensitive nitrogenase enzyme of the rhizobium symbionts (Sturms et al., 2010). Nitrogen fixation by Frankia does not require an external oxygen scavenger because these bacteria are able to create their own oxygen barrier by forming a shell of hopanoid lipids. Legumes recruited class 2 hemoglobins and Parasponia independently recruited a class 1 hemoglobin to function as oxygen transporters in nitrogen-fixing symbiotic nodule cells. Thus, in this second evolutionary scenario, nodulation with Frankia in proto-legumes and Parasponia shares a relatively short common evolutionary history from the origin of nodulation to the diversification of the two lineages, followed by 100 MY of separation during which both lineages independently switched from hosting Frankia to rhizobia. The symbiont exchange could have been achieved through a temporary phase of coinfection of nodules by (B) The model of a single acquisition of nodulation with Frankia at the origin of the NFC and multiple losses in the nonnodulating lineages (red crosses). This acquisition required the suppression of PAMP-triggered immunity against the infecting bacteria, a process possibly involving the NAD1 protein. The evolution of nodulation might have involved a rapid succession of stepwise innovations including the replacement of mycorrhizal fungal infection by Frankia infection, followed by cortical cell divisions resulting in the production of small foci of infected cells before the development of nodules. Two independent switches of symbionts occurred (indicated by green to blue dot), once in the Parasponia lineage of the Rosales and once in the ancestor of the legumes in the Fabales. These switches to a Rhizobium symbiont were possible due to the recruitment and mutation of a nonsymbiotic, nonoxygen transporting class 1 or class 2 hemoglobin (nsHb1 or nsHb2), forming the oxygen-transporting Parasponia hemoglobin (Parab) or leghemoglobin (Lb), respectively. (C) An alternative model proposes that the ancestor of the NFC acquired the possibility to extend mycorrhizal infection with Frankia infection of root cortical cells. The subsequent stages imply multiple independent acquisitions of nodule organogenesis as in (A) and two independent symbiont switches as in (B). The trees are schematic representations based on published chronograms (Doyle, 2016). The present status of clades is indicated by a blue or green nodule for Rhizobium-infected or Frankia-infected nodulating clades and by a black cross for nonnodulating clades. the primary Frankia symbiont and a secondary rhizobium symbiont, followed by the full overtaking of Frankia by the rhizobia. Indeed, coinfection of nodules by a nodule-inducing bacterium and a nonnodulating bacterium is often observed in legumes and actinorhizal plants (Martínez-Hidalgo and Hirsch, 2017). Furthermore, the replacement of the symbiont likely required new host functions for recognizing the new symbionts (which have a distinct cell envelope) and for suppressing immune responses against the rhizobia. From the bacterial side, the rhizobia acquired the nodulation genes required for Nod factor synthesis ( Figure 1A), although the origin of these genes is still an enigma. Nod factors were required to replace the Frankia factors that triggered the nodulation program, and they also helped suppress immune responses by the host (Liang et al., 2013).
One could also envision a third, in-between scenario in which the ancestor of the NFC established an interaction with Frankia spp without nodule formation and organogenesis, instead forming nitrogen-fixing structures similar to arbuscular mycorrhiza (structures formed without organogenesis) with little or no cell division, facilitating bacterial uptake and forming infection foci ( Figure 5C; see Parniske 2018 for an inspiring discussion on this scenario). In other words, in this third scenario, the hypothetical "predisposition of nodulation" of scenario 1 would correspond to the replacement (or extension) of a mycorrhizal fungus with a filamentous bacterial symbiont of the Frankia genus. During the second stage, Frankia were replaced by rhizobia independently in legumes and Parasponia, as described in scenario 2. Finally, during the third stage, nodule organogenesis evolved independently in multiple lineages. The timing of stage two and three could also be inverted. The present day absence of nodulation in many lineages of the NFC could correspond to the loss of nodulation or the loss of the more ancient nitrogen-fixing Frankia symbiosis without organogenesis. It is harder to reconcile this third scenario with the presence of putative organogenesisrelated genes in the common gene set of the Medicago and Parasponia transcriptomes. However, this overlap could represent the combined result of the early shared evolution toward a simple infection focus in the root cortex and the later convergent recruitment of orthologous gene functions after the separation of the two lineages. Moreover, the three symbiosis genes (NFP, NIN, and RPG) that were lost in nonnodulating plants of the NFC are involved in infection and are thus not necessarily (exclusive) markers of nodule organogenesis. On the other hand, the third scenario fits well with the existing differences in infection, ontogeny, and physiology of nodules formed by plants in the different lineages of the NFC. For example, the nodules of Parasponia and actinorhizal plants have a central vascular bundle, whereas the nodules of legumes contain multiple peripheral vascular bundles (Behm et al., 2014;Downie, 2014). In addition, legumes and nonlegumes of the NFC differ in their reliance on cytokinin signaling (van Zeijl et al., 2018;Gauthier-Coles et al., 2019). Finally, the nodule-specific transcriptomes of different plants are largely dissimilar Wu et al., 2019; despite the overlap of a subset of nodule-enhanced genes in Medicago and Parasponia;van Velzen et al., 2018), suggesting that organogenesis in these plants followed distinct evolutionary paths.
Investigating whether the large overlap between nodule transcriptomes observed between P. andersonii and M. truncatula could be extended to all nodulating plants could be informative. The available nodule transcriptomes from legumes cover all the major clades of the Papilionoideae subfamily, the hologalegina (Medicago, Pisum, Lotus, Sesbania), the millettioids (Phaseolus, Glycine), and the genistoids (Lupinus), but no data are available for species of the other large clade of the Leguminosae, the Mimosoideae-Caesalpineae-Cassieae clade. In the nonlegume NFC plants, the Cucurbitales are represented by the Datsica transcriptome, the Fagales by the Alnus and Casuarina transcriptomes, and the Rosales by C. thyrsiflorus and the rhizobiuminfected Parasponia transcriptomes. A systematic comparative analysis identifying orthologous genes in the available transcriptome data, complemented with new data from the unexplored branches of the legume family, would certainly provide insights into the evolutionary scenario of nodulation. Such analysis would further highlight the mechanisms in nodulation that are ancestral and those that are more recent innovations specific to particular lineages of the NFC.

REGULATION OF THE NODULE TRANSCRIPTOME Transcription Factors
The stimulation of the CSSP by Nod factors results in the activation of a TF network composed of IPD3 (CYCLOPS), IPD3L, NIN, NSP1, NSP2, DELLA proteins, ERN1, ERN2, NF-YA1, and potentially other yet-to-be identified TFs (Schauser et al., 1999;Kaló, 2005;Smit et al., 2005;Combier et al., 2006;Andriankaja et al., 2007;Marsh et al., 2007;Middleton, 2007;Asamizu et al., 2008;Cerri et al., 2012;Singh et al., 2014;Fonouni-Farde et al., 2016;Jin et al., 2016; for reviews on the topic, see Oldroyd, 2013;Soyano and Hayashi, 2014;Jin et al., 2018;Roy et al., 2019). This TF network activates the infection process, nodule organogenesis, and autoregulation of nodulation (a long-distance negativefeedback pathway that controls nodule number based on the number of already formed or initiated nodules and the nitrogen needs of the plant; Figure 1A). The NIN TF emerged from these studies as a master regulator of nodulation that coordinates these processes (Soyano et al., 2013;Vernié et al., 2015;Liu et al., 2019aLiu et al., , 2019b, which is in agreement with the proposed early recruitment of NIN during the evolution of nodulation in the NFC. Several direct target genes of this network have been identified (e.g. ENOD11, CRE1, NPL1 [NODULATION PECTATE LYASE 1], CLE), but how this TF network further connects to the downstream nodule transcriptome and the massive reprogramming in the symbiotic cells ( Figure 2B) remains to be clarified. This regulation most likely involves the generation of secondary signals (such as phytohormones and others) as well as a downstream TF network.
Nevertheless, some TFs involved in late stages of nodule formation have been identified, but they have no direct link to transcription in symbiotic cells or their roles remain to be pinpointed. A few early studies have identified potential regulators in symbiotic nodule cells, which were not further characterized (Heard and Dunn, 1995;Heard et al., 1997;Jørgensen et al., 1999). Other transcriptional regulators, such as the Medicago Krüppel TF MtZPT2, the Medicago basic helix-loop-helix TF MtbHLH1, NOOT in Medicago and Lotus (belonging to the NPR1-like BTB/POZ-Ankyrin domain protein family), and soybean GmbHLHm1 (formerly GmSAT1), are important for nodule formation but are not directly involved in symbiotic cell-specific gene regulation. Their genes are not expressed in symbiotic cells but are instead expressed in roots and nodule vascular bundles or nodule parenchyma, where they affect nodule vasculature patterning and nutrient transport between nodules and roots (Frugier et al., 2000;Godiard et al., 2011;Couzigou et al., 2012;Chiasson et al., 2014;Magne et al., 2018).
Other characterized TFs have demonstrated roles in symbiotic cell formation. The TF EFD, belonging to the ethylene response factor family, controls symbiotic cell formation, bacteroid differentiation, and the expression of a subset of nodule-specific genes, including some NCR genes (Vernié et al., 2008). However, this regulation of nodule-specific genes could be indirect, as MtRR4 is the only gene found to be directly regulated by EFD (Vernié et al., 2008). MtRR4 is a type A cytokinin response regulator induced by cytokinins that is part of a negative feedback loop involved in repressing the cytokinin pathway. Cytokinins are key signals that activate cell divisions in the root cortex to establish a nodule primordium (Gamas et al., 2017). The repression of cytokinin signaling mediated by EFD through MtRR4 might be essential for promoting symbiotic cell differentiation. Furthermore, a group of three class 2 KNOX homeodomain TFs directly or indirectly regulate EFD and MtRR4 expression (Azarakhsh et al., 2015;Di Giacomo et al., 2017). One of these KNOX TFs, MtKNOX3, is strongly induced in nodules and is active in differentiating symbiotic cells. This TF could be involved in regulating other genes, including symbiotic cell-specific genes.
The C 2 H 2 TF RSD is expressed in differentiating symbiotic cells and is required for symbiosome formation. However, data suggest that RSD act as a repressor of a small set of genes to influence vesicle transport in symbiotic cells and avoid mis-targeting of specific vesicles and their cargo to symbiosomes (Sinharoy et al., 2013). Thus, the present data on RSD are not in agreement with a role of this TF in positively activating symbiotic cell-specific gene expression. Similarly, the Medicago TF MtbHLH2 might be expressed in infected nodule cells but is thought to act as a negative regulator of genes that stimulate nodule senescence (Deng et al., 2019). A small set of APETALA2 TFs in soybean, L. japonicus, and P. vulgaris might have a negative effect on nodulation, including symbiotic cell functioning. These TF were identified because they are targeted by a nodule-enhanced microRNA (miRNA), and their upregulation in symbiotic cells may be correlated with the activation of senescence-associated genes (Wang et al., 2014;Holt et al., 2015;Nova-Franco et al., 2015).
In a large-scale effort in soybean, the expression profiles of ;1000 genes encoding putative transcriptional regulators were analyzed during nodule formation via RT-qPCR (Libault et al., 2009a). This systematic screening identified 39 genes that are upregulated during nodule formation. A gene encoding a MYB TF was found to be expressed in vascular tissues and not in symbiotic nodule cells. Although the remaining genes were not further characterized, they constitute a collection of putative regulators in symbiotic nodule cells.
Overall, very little is thus known about how the massive transcriptome switch in symbiotic nodule cells is directly regulated. Libault et al. (2009b) and Young et al. (2011) cataloged all TFs in the M. truncatula genome. Analyzing the MtGEA and Symbimics databases using nodule-specific or nodule-enhanced expression and expression in symbiotic cells (IZ and zone III) as criteria resulted in the identification of a set of interesting TFs that might be involved in gene-specific regulation in symbiotic cells (Supplemental Figure). These TFs include NIN (the abovementioned master regulator of nodulation) and IPD3. IPD3 directly interacts with the CSSP component calcium-and calmodulin-dependent protein kinase (CCaMK) and functions in Nod factor activation of the CSSP, infection thread formation, and the release of rhizobia from infection threads, organogenesis, and symbiotic cell formation (Yano et al., 2008;Maunoury et al., 2010;Horváth et al., 2011;Ovchinnikova et al., 2011;Singh et al., 2014). The transcriptome data show that IPD3 is most strongly expressed in differentiating and mature symbiotic cells of the interzone and zone III. This finding corresponds with the results of IPD3 promoter-GUS fusion analysis (Messinese et al., 2007). CCaMK is also expressed in these nodule zones but has a uniform expression pattern across all nodule tissues. These expression patterns suggest that IPD3 and NIN also play important roles in controlling gene expression in differentiating and mature symbiotic cells. Therefore, NIN and IPD3 might regulate symbiotic cellspecific genes directly or indirectly (through downstream TFs). In agreement with this hypothesis, the nodule transcriptome is not activated in the nodule-like structures formed in the M. truncatula ipd3 mutant TR3 (Maunoury et al., 2010). Interestingly, genetic analyses also uncovered a role for other components of the CSSP in multiple stages of nodulation Capoen et al., 2005;Limpens et al., 2005). These observations suggest that the CSSP is a regulatory module that is used and re-used to control all stages of nodulation, from the first response to Nod factors to infection thread formation to the endocytotic uptake of rhizobia and to symbiotic cell differentiation and functioning. During each step, this pathway appears to activate different gene sets depending on the cellular context.

Cis-Regulatory Elements in the Promoters of Nodule-Specific Genes
Leghemoglobin genes, the prototypical symbiotic cell-specific genes, were the first legume genes cloned (Sullivan et al., 1981). Their identification was rapidly followed by the identification of other nodule-specific genes encoding nodulins (Legocki and Verma, 1980;Sandal et al., 1987). The promoters of these genes were experimentally screened for cis-acting elements required for high level, nodule-specific expression (Stougaard et al., 1987;Jensen et al., 1988;Metz et al., 1988;Szabados et al., 1990;Szczyglowski et al., 1994;Frühling et al., 2000;Nallu et al., 2013). These elements were used to demonstrate the existence of nodule-specific trans-acting factors. Unfortunately, these efforts were not further pursued, and the putative TFs were not characterized Metz et al., 1988).
The precise identification of cis-acting elements should in principle facilitate the straightforward screening of TFs using methods such as yeast-one-hybrid analysis. Thus far, this strategy has not led to the identification symbiotic cell-specific genes. Such an approach has only been performed on cis-acting elements of the Nod factor-responsive promoter of the very early activated gene ENOD11, resulting in the identification of the TF ERN1 (Andriankaja et al., 2007). Certainly, this approach holds promise for elucidating the transcriptional network in symbiotic nodule cells.

Chromatin-Based Regulation of Nodule-Specific Gene Expression
The tight regulation of nodule-specific genes clearly requires specific TFs, but in addition, regulation at the chromatin level might also be important for keeping these genes silent during all stages of plant development. Genes with high tissue-specific expression are often actively silenced throughout much of the plant lifecycle via epigenetic mechanisms. Chromatin-based silencing of nodule-specific genes could be released during symbiotic cell differentiation, making them accessible to TFs.
The soybean protein CPP1 (cysteine-rich polycomb-like protein 1) is a DNA binding protein that interacts with the promoter of the leghemoglobin gene Gmlbc3 (Cvitanich et al., 2000). CCP1 contains conserved CXC domains homologous to a domain in some polycomb proteins. CCP proteins are widely distributed in plants and animals where they form small families. CCP proteins Tso1 in Arabidopsis, TESMIN in mammals, Mip120 and Tombola in Drosophila, and LIN-54 in Caenorhabditis elegans function in chromatin complexes and are involved in cell cycle regulation, cell differentiation, and reproduction (Sijacic et al., 2011;Beall et al., 2007;Jiang et al., 2007). Despite some initial characterization in various organisms, much remains to be learned about the mechanisms of these transcriptional regulators and their biological roles. Soybean CCP1 binds the Gmlbc3 promoter and represses its expression, which is in agreement with the notion that CCP1 is part of a chromatin-related repression complex.
Further evidence for the key role of chromatin-based regulation in the massive gene activation during symbiotic cell differentiation in M. truncatula was obtained by analyzing the chromatin features of nodule-specific genes during symbiotic cell differentiation. This differentiation is driven by endoreduplication whereby postmeristematic nodule cells switch their cell cycle to an endocycle and multiply their genome from a 2C/4C DNA content in meristem cells, over 8C and 16C in differentiating cells, to 32C and 64C in maturing and mature symbiotic cells (Vinardell et al., 2003). Gene expression analysis of purified nodule nuclei of increasing ploidy levels revealed a correlation between the ploidy level and the sequential activation of nodule-specific genes in these nuclei, suggesting that endoreduplication is involved in this process . The expression levels of nodule-specific genes in nodule nuclei with different ploidy levels correlated with the opening of chromatin as well as with reduced H3K27me3 (trimethylation of Lys 27 in histone H3) levels in a subset of tested genes . Similarly, the above-mentioned SRIs showed reduced H3K27me3 levels in nodules versus roots (Pecrix et al., 2018). H3K27me3 is a repressive chromatin mark introduced by the Polycomb complex, a regulatory complex involved in the epigenetic silencing of chromatin. Thus, the activation of nodule-specific genes could rely on the opening of the chromatin and the release of H3K37me3-mediated repression during symbiotic cell differentiation.
DNA methylation represents another level of epigenetic control of the expression of symbiotic nodule cell-specific genes. DNA methylation of promoters usually leads to gene silencing. A comparison of DNA methylation levels in whole roots versus nodules or in nodule nuclei of different ploidy levels showed that many symbiotic nodule cell-specific genes, including genes encoding NCRs, calmodulin-like proteins, leghemoglobin, GRPs, and many others, as well as many transposons, become demethylated, which correlates with the activation of their expression (Satgé et al., 2016;Nagymihály et al., 2017). Remarkably, SRIs show differential methylation patterns in roots versus nodules, and the borders of the SRIs sharply define the limits of differentially methylated regions, with the flanking regions showing very low and nondifferential methylation (Pecrix et al., 2018). The DNA demethylation of nodule-specific genes is largely regulated by the activation of the DNA demethylase gene DEMETER (DME) during symbiotic nodule cell differentiation (Satgé et al., 2016). Remarkably, DME activity also results in the loss of DNA methylation and the expression of NCR-flanking transposons during symbiotic cell differentiation Satgé et al., 2016). Since transposons are epigenetically silenced, the activation of these transposons in symbiotic cells confirms the notion that the restructuring of chromatin occurs in these loci during symbiotic cell differentiation.
Thus, epigenetic control via DNA methylation and histone modifications appears to maintain the nodule-specific genes and SRIs silenced. During the endoreduplication-driven differentiation of symbiotic cells, epigenetic reprogramming of these loci renders them accessible and competent for transcription. However, the current data only establish correlations between gene expression, chromatin modifications, and endoreduplication. A future challenge will be to establish causal relationships between these processes.

The Regulation of Gene Expression by Alternative Splicing, Small RNAs, and Small Peptides
Additional mechanisms that control symbiotic cell-specific expression have been reported. One of these mechanisms is the alternative splicing of the Syntaxin 132 (SYP132 or SYP13II) gene (Pan et al., 2016;Huisman et al., 2016). SYP132 encodes a t-SNARE protein that functions in the secretory pathway controlling the fusion of secretory vesicles with the membrane of the target organelle. In M. truncatula, SYP132 is transcribed into two different transcripts due to alternative splicing. The SYP132b (or SYP132C) transcript is ubiquitously expressed, while the SYP132a (or SYP132A) transcript is preferentially expressed in symbiotic nodule cells and in arbuscular mycorrhiza cells. The two transcripts differ in their 13 th terminal exon, which in the SYP132b transcript is located immediately downstream of the previous exon. By contrast, in the SYP132a transcript, this canonical 13 th exon is skipped and replaced by an exon located ;2000 base pairs downstream. The two encoded proteins are therefore almost identical, differing only in 60 C-terminal amino acids. RNA interference experiments specifically targeting one of these transcripts revealed that SYP132b is essential for plant growth, whereas nodule-and mycorrhiza-specific SYP132a is essential for arbuscule formation and bacteroid differentiation. The SYP132 proteins are transported to the symbiosomes in symbiotic nodule cells. Downregulating the SYP132a transcript arrests bacteroid differentiation, presumably due to a blockage in the transport of nodule-specific secretory proteins, including NCRs, to the symbiosomes. It is not yet understood how this alternative splicing is regulated and if this is a common mechanism for nodule-specific gene expression or unique to SYP132. However, the finding that alternative splicing of SYP132 is conserved in most dicots including plants not in the NFC and is absent in dicot species that have lost mycorrhizal symbiosis suggests that this mechanism is related to mycorrhiza formation and has been co-opted for nodule symbiosis.
miRNAs are small regulatory RNA molecules that control the expression of specific target genes by binding to their mRNAs and affecting mRNA translation or stability. miRNAs control early stages of nodulation before the establishment of symbiotic cells (reviewed in Couzigou and Combier, 2016). For example, the TF NF-YA1 in M. truncatula, which controls nodule meristem function, is negatively regulated by the miRNA miR169 (Combier et al., 2006). In addition, the class-III homeodomain-leucine zipper TFs MtHB8, MtCNA1, and MtCNA2, which are involved in root and nodule meristem formation as well as vascular bundle differentiation, are negatively regulated by miR166 (Boualem et al., 2008). In L. japonicus miR2111 undergoes shoot-to-root translocation and controls the autoregulation of nodulation response in roots by negatively regulating the expression of TML, encoding an F-box protein that suppresses symbiosis (Tsikou et al., 2018). In soybean, a whole array of miRNAs affect the early stages of nodule induction and therefore nodule number (Yan et al., 2015;Wang et al., 2014;Wang et al., 2015;Hossain et al., 2019). In agreement with this role of miRNAs, Argonaute5, a small RNA-guided silencing protein involved in mediating miRNA activity, is required for nodule induction (Reyero-Saavedra et al., 2017). miRNAs are processed from primary transcripts resulting from the transcription of miRNA genes by RNA polymerase II. These primary miRNA transcripts often encode small peptides called miPEPs. miPEPs enhance the transcription of their cognate primary transcript and thereby the accumulation of the corresponding miRNA (Lauressergues et al., 2015). In soybean, miR172c targets the APETALA 2 TF NNC1, which negatively regulates nodule number (Wang et al., 2014;Wang et al., 2019). The miPEP172c peptide is encoded by the primary transcript of miR172c. This peptide stimulates the expression of this primary transcript, thereby positively regulating nodulation (Couzigou and Combier, 2016). Regulatory small peptides can also be encoded by protein encoding mRNAs. An alternatively spliced mRNA of the TF NF-YA1 encodes such a peptide, called uORF1p (Combier et al., 2008b). The alternatively spliced mRNA and uORF1p are expressed in nodule tissues adjacent to the meristem. uORF1p negatively regulates the expression of NF-YA1 and thus, in combination with the above-mentioned miR169, restricts the expression of this gene to the nodule meristem.
Thus, the initial stages of nodule formation are regulated by miRNAs and small peptides that control the transcript levels of their target TFs. Later stages of the nodule development are also regulated by miRNAs in L. japonicus, soybean, and P. vulgaris. In L. japonicus, miR397 is systemically induced only by the presence of functional, nitrogen-fixing nodules and it downregulates a member of the laccase protein family (De Luis et al., 2012). miR397 is conserved in many plants and is involved in regulating copper homeostasis. However, whether this regulation takes place in symbiotic cells or in other nodule cells remains to be confirmed. The P. vulgaris miRNA homolog of the abovementioned miR172c plays a regulatory role in a late stage of nodulation. miR172c represses the expression of the APETALA2 TF gene AP2-1 (Nova-Franco et al., 2015). An analysis of the roles of AP2-1 and miR172c suggested that AP2-1 activates senescence-related genes in symbiotic cells and that miR172c is required to downregulate AP2-1 in active nodules. By contrast, in nonactive nodules infected with a nonfunctional rhizobium strain or induced by nitrate application, miR172c expression is reduced, leading to the induction of AP2-1 and senescence. Corresponding miRNAs in soybean and L. japonicus are expressed in symbiotic cells of mature nodules; these miRNA172s and their AP2 targets could play similar roles to those in P. vulgaris (Wang et al., 2014;Holt et al., 2015). Finally, miR393j-3p targets and represses the expression of ENOD93 in soybean; this gene is specifically expressed in nodules and symbiotic cells (Yan et al., 2015).
Small RNAs derived from endonuclease-mediated tRNA cleavage (tRNA fragments or tRFs) are a recently described class of small RNAs that associate with Argonaute proteins and regulate gene expression (Kumar et al., 2016). A recent analysis of small RNA transcriptomes from soybean nodules and cultures of its symbiotic partner Bradyrhizobium revealed numerous Bradyrhizobium-derived tRFs, most with higher abundance in nodules than in culture (Ren et al., 2019). Twenty-five of these tRFs were predicted to target 52 different soybean genes, suggesting that they contribute to the posttranscriptional regulation of these genes. Three selected rhizobial tRFs were indeed shown to be loaded on soybean Argonaute protein 1 and to downregulate their predicted target genes GmRHD3a/GmRHD3b, GmHAM4a/ GmHAM4b, and GmLRX5. The silencing of these target genes by the rhizobial tRFs occurred during the early stages of infection and enhanced nodulation. These findings point to a new level of interspecies communication between the rhizobia and their legume hosts. The finding that additional Bradyrhizobium tRFs are predicted to target other soybean genes and that a different set of rhizobial tRFs potentially targeting a different set of host genes was identified in a small RNA transcriptome from the legume P. vulgaris suggests that this type of signaling is not restricted to these three reported examples. Although it is too early to fully understand the implications of this new signaling mechanism, it is tempting to speculate that it introduces yet another level of control to host-specificity to match the right symbiont with the right plant. Another intriguing question is how these rhizobial small RNAs reach their target genes in the plant cell. An exciting possibility is that they might be delivered via outer membrane vesicles produced by the rhizobia, a mechanism similar to that used by the human pathogen Pseudomonas aeruginosa, which delivers a tRF via outer membrane vesicles to epithelial cells in the human airway (Koeppen et al., 2016).
Taken together, these pioneering studies have made it clear that miRNAs and other small RNAs play a role in coordinating and finetuning gene expression in symbiotic nodule cells. Accordingly, downregulating Argonaute5 expression in P. vulgaris and soybean not only reduces nodule initiation but also affects the formation of symbiotic cells (Reyero-Saavedra et al., 2017). Dedicated transcriptome analyses focusing on small RNAs expressed in nodules and infected root hairs have been performed for several legumes, including M. truncatula (Lelandais-Brière et al., 2009;Formey et al., 2014;Pecrix et al., 2018), soybean (Subramanian et al., 2008;Joshi et al., 2010;Yan et al., 2015;Yan et al., 2016), P. vulgaris (Peláez et al., 2012;Formey et al., 2016), and L. japonicus (Holt et al., 2015). The corresponding databases represent promising sources for the future discovery of novel RNAs that control gene expression in nodules.
Finally, long noncoding and antisense RNAs (lncRNAs) might also be involved regulating nodule-specific gene expression. LCM RNA-seq analysis of M. truncatula nodules (SYMBIMICS data set) has revealed a surprisingly large number (over 1000) of lncRNAs expressed in nodules. Most of these are active in differentiating or mature symbiotic cells. Many are located in SRIs, and the expression profiles of many of these lncRNAs were correlated with the expression of the closest mRNA-encoding genes in the genome, suggesting these lncRNAs play regulatory roles (Pecrix et al. 2018). For example, the TF EFD is expressed in young symbiotic cells in the distal and proximal infection zones of the nodule. Strikingly, an antisense transcript of EFD showed a complementary expression profile, that is, in older symbiotic cells in the interzone and fixation zones. This finding suggests that EFD is downregulated by antisense transcripts in order to promote the final stages of symbiotic cell differentiation. Indeed, a large set of the above-mentioned nodule-specific genes in M. truncatula (e.g. genes encoding NCRs, GRPs, ENOD8, ENOD16, ENOD20, calmodulin-like proteins, leghemoglobin, nodulin 25 IPD3, NIN, SymCRK, DME, DNF1, RSD, and ERN2) as well as many others have associated lncRNAs or antisense transcripts. The biological meaning of this will certainly be worth deciphering.

CONCLUDING REMARKS
Transcriptomic analysis of nodule formation has been instrumental in the discovery of new genes that were subsequently shown by reverse genetics to be important for nodule formation. Thus, transcriptomics have played a complementary role (and will probably continue to do so) to the many forward genetic screens that have been successfully conducted by several laboratories to identify key regulators of nodulation (reviewed by Roy et al., 2019). However, transcriptomics analysis also provides broader insights into the nodulation process. Here, we argue that nodule formation is a unique developmental process that is supported by drastic transcriptional reprogramming. This reprogramming involves the activation of many hundreds of genes, many of which are exclusively expressed in these organs and are novel genes that have been shaped by evolution since the origins of nodulation. A remarkable number of nodule-specific genes have signal peptides and are therefore transported to bacteroid-containing symbiosomes, suggesting that a major adaptation of nodule-forming plants is the massive production of a large variety of proteins that interact with endosymbionts. We have some knowledge about the functions of some of these proteins (e.g. the NCRs), but at the moment, we have no clue about most of their roles.
Even though the CSSP is the upstream regulatory apparatus that activates both nodule formation and mycorrhiza formation, the nodule-specific and mycorrhiza-specific transcriptomes are dramatically different. How the different read-outs of this conserved module are achieved is likely one of the most import questions to be solved in the field. Perhaps the components of the CSSP are arranged in multiprotein signaling modules anchored in the cell membrane through receptor complexes located in nanoclusters organized by microdomain-associated proteins, including remorins, flotillins, and FW2.2-like proteins (Haney and Long, 2010;Haney et al., 2011;Lefebvre et al., 2010;Ott, 2017;Qiao et al., 2017;Liang et al., 2018). Distinct nodulation signaling modules and mycorrhization signaling modules could activate the appropriate responses.
The regulatory mechanisms that underlie symbiotic cell formation and control the transcriptome switch are also poorly understood. With the recent identification of candidate TFs, the discovery of SRIs, the description of epigenetic modifications, and the documentation of transcriptional fine-tuning mechanisms by alternative splicing, miRNAs, miPEPs, tRFs, and antisense RNAs, we are only starting to decipher this process, and we can hope that the key regulators will soon be discovered. An intriguing observation from nodule transcriptomes is that the TF gene IPD3 is most strongly expressed in symbiotic cells. It would be interesting to determine its specific role in these cells and how this role relates to its transcriptional activity during the very early stages of the nodulation process.
An important implication of the current analysis is that nodules are unique organs, not only from the perspective of organogenesis and histological organization but also from a genetic perspective. All attempts to transfer nodule formation to nonlegume crops that are currently underway in different laboratories should take into account the finding that "naturally" evolved nodule formation relies on the concerted action of hundreds of genes, many of which appear to function specifically in nodules. It will be important in future efforts to understand in great detail not only the regulatory mechanisms of nodulespecific genes, but also their function as well as their evolution. A key issue for the future will be to understand how adaptive the nodule transcriptome response is. Gene regulation is usually considered to be an adaptive response of the organism, organ, or cell to a specific environmental or developmental condition so that genes are expressed when and where they are required. The validity of this presumption has not been tested thoroughly in plants. In bacteria, however, the picture has emerged that transcriptional responses are in large part suboptimal or even maladaptive (Price et al., 2013). Striking examples come from the rhizobium bacteroids in nodules themselves. Similar to their host cells, bacteroids undergo a major transcriptomic switch during their formation, with the induction of many hundred of genes and the downregulation of the rest of the genome. Some of these upregulated genes are essential for symbiosis, such as genes encoding nitrogenase and accessory functions. However, disappointingly from a researcher's perspective, in various studies, the mutation of many selected genes that were strongly upregulated in mature bacteroids resulted in no symbiotic phenotype for any of the tested genes (Karunakaran et al., 2009;Lamouche et al., 2018). Similarly, in the yeast Saccharomyces cerevisiae, only a small portion of genes upregulated under a particular growth condition contributed to the fitness of the yeast under that condition (Giaever et al., 2002). Thus, genes may be upregulated in a specific context only as a side-effect without having a clear functional contribution. Whether this is the case in nodules remains to be analyzed, as it is possible that part of the nodule transcriptome contributes little to symbiotic nitrogen fixation.
Clearly, it is not possible, at least with currently available technology, to engineer something as complex as nodules of the NFC. Thus, it will be important to understand to what extent the complexity present in current day nodules is required for nitrogen fixation, even at levels below the optimal level reached in legume nodules. Much attention has been paid to the early stages of nodule formation, but we would like to advocate for more intense efforts toward understanding symbiotic nodule cell formation. These unique cell types are at the heart of symbiosis and determine the high efficiency of the nitrogenfixation process.

Supplemental Data
Supplemental Figure. Expression of M. truncatula TFs.