Network Analysis of Enzyme Activities and Metabolite Levels and Their Relationship to Biomass in a Large Panel of Arabidopsis Accessions

Natural genetic diversity provides a powerful resource to investigate how networks respond to multiple simultaneous changes. In this work, we proﬁle maximum catalytic activities of 37 enzymes from central metabolism and generate a matrix to investigate species-wide connectivity between metabolites, enzymes, and biomass. Most enzyme activities change in a highly coordinated manner, especially those in the Calvin-Benson cycle. Metabolites show coordinated changes in deﬁned sectors of metabolism. Little connectivity was observed between maximum enzyme activities and metabolites, even after applying multivariate analysis methods. Measurements of posttranscriptional regulation will be required to relate these two functional levels. Individual enzyme activities correlate only weakly with biomass. However, when they are used to estimate protein abundances, and the latter are summed and expressed as a fraction of total protein, a signiﬁcant positive correlation to biomass is observed. The correlation is additive to that obtained between starch and biomass. Thus, biomass is predicted by two independent integrative metabolic biomarkers: preferential investment in photosynthetic machinery and optimization of carbon use.


INTRODUCTION
The rate of plant growth depends on the rate of photosynthetic carbon (C) assimilation and on developmental programs that influence how efficiently C is converted into biomass. These processes are sometimes termed source and sink strength, respectively. The molecular and genetic determinants of these parameters are not well understood. Reverse genetics has been widely used to study the relationship between the expression individual enzymes and the rate of photosynthesis (Stitt et al., 2010b). A 2-fold reduction of ribulose-1,5-bis-phosphate carboxylase/oxygenase (Rubisco; Quick et al., 1991;Mate et al., 1993;Stitt and Schulze, 1994), sedoheptulosebisphosphatase (Harrison et al., 1998), and aldolase (Ald) (Haake et al., 1998(Haake et al., , 1999) and a somewhat larger decrease of transketolase (TK) (Henkes et al., 2001) lead to a perceptible inhibition of photo-synthesis in various species. Phosphoglycerate kinase (PGK), NADP-dependent glyceraldehyde-3-phosphate dehydrogenase (NADP-GAPDH), triose phosphate isomerase (TPI), and phosphoribulokinase can be reduced 5-to 10-fold without a perceptible impact on pathway flux (Paul et al., 1995;Price et al., 1995;Stitt, 1995;Paul and Pellny, 2003;Stitt et al., 2010a). The contributions of individual enzymes to the control of flux depend on the conditions at the moment and on the prehistory of the plant (Stitt and Schulze, 1994;Stitt and Sonnewald, 1995). A similar picture has emerged for sucrose and starch synthesis (Stitt, 1995;Stitt and Sonnewald, 1995;Paul and Foyer, 2001). Photosynthetic metabolism operates as an integrated network, in which metabolite levels and fluxes are determined by interactions between many enzymes. Analogous conclusions have been reached from experimental analyses of metabolism in bacteria, yeast, and mammals.
The relationship between photosynthetic C gain and growth is also complex (Gifford and Evans, 1981;Sparks et al., 2001). For example, inhibition of photosynthesis due to decreased expression of Rubisco leads to a strong inhibition of growth in nitrogenreplete conditions but not in nitrogen-limited conditions (Stitt and Schulze, 1994). More generally, free air CO 2 elevation studies reveal that higher rates of photosynthesis rarely lead to a commensurate increase in biomass (Long et al., 2006;Rogers and Ainsworth, 2006;Leakey et al., 2009).
Natural genetic diversity provides a powerful and complementary resource to analyze complex metabolic networks Stitt et al., 2010a). Accessions or cultivars of a given species have allelic diversity for hundreds or thousands of genes, allowing a systems-level study of how metabolic networks respond to many independent perturbations. Natural genetic diversity is often investigated using inbred lines (ILs), which uncover inherent genetic diversity by resorting two genomes. ILs allow coarse mapping of quantitative trait loci (QTL), although these still represent relatively large genomic regions and cloning of the underlying genes (e.g., Fridman et al., 2000) is laborious. A complementary approach is to survey large panels of genotypes from wild populations or crop breeding programs to identify correlations between traits on a species-wide basis. Association mapping can be performed with candidate genes (Baxter et al., 2008;Sulpice et al., 2009) or genome-wide scans (Buckler and Thornsberry, 2002;Aranzana et al., 2005) to identify loci that may influence the traits, although this requires rigorous correction for population structure (Nordborg et al., 2005;Zhao et al., 2007).
Using ILs populations, it has been previously shown Meyer et al., 2007) that rosette biomass in Arabidopsis and fruit yield in tomato correlate negatively with metabolites. While the correlation to rosette biomass was very weak for individual metabolites, a highly significant correlation was obtained when multivariate analysis was used on the entire metabolite profile (Meyer et al., 2007). This indicates that part of the genetic variation for biomass affects the balance between resource availability, and the developmental programs that determine how rapidly metabolites are used for growth. ) used multivariate analysis to show that the starch content at the end of the day is an integrator of the metabolic response in Arabidopsis. Starch correlated strongly and negatively with rosette biomass in a population of 92 genotypically diverse accessions. Starch accumulates during the light period and is remobilized to support metabolism and growth at night. These results imply that there is genetic variation in C allocation and that this contributes to the differences in biomass between Arabidopsis accessions.
Metabolic interconversions are catalyzed by enzymes. The question arises whether the genetic variation in metabolite levels can be explained by changes in the activities of one or several enzymes. If this were the case, it would provide an attractive strategy to link the changes in metabolite levels to genetic variation at loci that encode enzymes or regulate the expression of enzymes. Enzyme activities exhibit considerable natural genetic variation. Analyses of a maize (Zea mays) IL population detected QTL for sucrose phosphate synthase (SPS) and invertase activity, which colocated with structural genes for the enzymes (Causse et al., 1995;Prioul et al., 1999;Thevenot et al., 2005). A study of six enzymes from primary metabolism and four enzymes from secondary metabolism in an Arabidopsis recombinant inbred lines (RIL) population detected several enzyme activity QTL that colocalized with structural genes (Mitchell-Olds and Pedersen, 1998). This study also found strong correlations between the activities of five glycolytic enzymes (PGI, PGM, G6PDH, FBPase, and glucose-6-phosphatase), and coarse mapping detected a trans-QTL for three of the enzymes, which did not colocate with their structural genes and might represent a joint regulator of these enzymes. A recent study of 15 enzymes from central metabolism in a Landsberg erecta (Ler) 3 Cvi RIL population detected a total of 15 QTL for nine of the enzymes. Some colocated with structural genes and expression QTL, while others were regulated by trans-acting loci (Keurentjes et al., 2008). However, few studies have investigated enzyme activities and metabolites from central metabolism in parallel, and those that did only investigated a small number of traits (Keurentjes et al., 2008). This is partly for technical reasons. It is still a challenge to obtain quantitative information about large numbers of proteins (Baerenfaller et al., 2008) or enzyme activities.
The relationship between enzyme activities and metabolite levels could be rather nonintuitive. As already outlined, empirical studies show that a change in the level of a single enzyme can result in complex changes of metabolites and fluxes. Theoretical considerations point to the same conclusion. In a simple linear pathway, metabolite levels will change inversely relative to flux at the step(s) where regulatory mechanisms operate to alter enzyme activities (Rolleston, 1972;Newsholme and Start, 1973). This expectation has been explicitly included in algorithms to detect sites at which genetic regulation may affect metabolism (Rowe et al., 2008). However, the relationship between metabolites and flux becomes more complicated if the pathway is regulated at multiple steps (Fell and Thomas, 1995;Hofmeyer and Cornish-Bowden, 2000) or by feedback or feedforward loops (Hofmeyr and Cornish-Bowden, 2000;Curien et al., 2003). It becomes even more complex in networks that contain branch points, cycles, and redundant reactions (Kacser and Burns, 1973;Fell and Thomas, 1995;Hofmeyr and Cornish-Bowden, 2000;Curien et al., 2003;Junker et al., 2007). Furthermore, if there is a coordinated increase of all the enzymes in a pathway, flux will increase but the levels of metabolic intermediates will remain essentially unaltered (Kacser and Acerenza, 1993). The extent of the connectivity between enzyme activities and metabolites remains an open question. It may depend on the complexity of the pathway and the extent to which genetic variation generates coordinated or noncoordinated changes of enzyme activities.
We have robotized activity assays for >20 enzymes from central metabolism in Arabidopsis leaves (Gibon et al., 2004b and subsequently extended the platform to cover 35 enzymes from different pathways in central metabolism. The assays are optimized to measure the maximum catalytic activity (Newsholme and Start, 1973), which can be seen as a proxy for the amount of an enzyme (Piques et al., 2009;Stitt et al., 2010a). This allows us to study natural variation in enzyme activities in central metabolism in a more systematic manner than was previously possible. In the following study, we profile the maximum catalytic activities of 35 enzymes in the rosettes of 129 Arabidopsis accessions, including all 92 accessions used by Sulpice et al. (2009). Our aims are (1) to document the extent of genetic variation for activities of a large number of enzymes in central metabolism at a species-wide scale and (2) to ask whether natural genetic diversity results in coordinated or uncoordinated changes of the activities of the various enzymes. We then combine the information about enzyme activities with a data set for metabolite levels and biomass in the same material  and examine (3) how much connectivity exists between enzyme activities and metabolite levels, (4) whether enzyme activities provide predictive information about biomass, and (5) whether this prediction is redundant with or additive to the prediction provided by analyses of metabolite levels. We show that there is considerable genetic variation in enzyme activities and that the majority of the changes in enzyme activity are highly correlated with each other, whereas there is little connectivity between enzyme activities and metabolite levels. Furthermore, enzyme activities can be used to derive an integrative parameter that is significantly correlated with biomass, independent of the prediction provided by starch and which, together with starch, explains almost a third of the species-wide variation in biomass in Arabidopsis.

Parameters Investigated
A set of 129 Arabidopsis accessions, including all 92 accessions analyzed in Sulpice et al. (2009), was selected from a larger set of 406 accessions to maximize genotypic diversity . The accessions and their passport data are listed in Supplemental Data Set 1A online. They were grown in short-day conditions at 208C to focus on the response during vegetative growth with limiting C and excess N and harvested at the end of the light period when internal resources like starch and amino acids have accumulated to their diurnal maxima. A list of measured metabolites and enzymes, with abbreviations and classifications into pathways, is provided in Supplemental Data Sets 1B and 1C online.
Enzyme maximum activities were measured in stopped assays, using sensitive cycling assays to measure the products (Gibon et al., 2004b). This allows assays to be performed at very high extract dilutions, which minimizes interference by other components of the extract. For all assays, it was checked that the reaction was linear with time and the amount of extract added and that substrate levels supported maximum activity of the enzyme. Maximum activities provide information about the enzymatic capacity for a given reaction and will overestimate the activity in vivo, which also depends on the levels of substrates, inhibitors, activators, and, in some cases, posttranslational modification. The maximum activities of this set of enzymes correlate well with summed protein levels for the gene family, determined by mass spectroscopy (Piques et al., 2009). Most enzyme activities are stable during the diurnal cycle, even though there are often marked diurnal changes in the levels of the encoding transcripts (Gibon et al., 2004b. The results obtained for enzyme activities from one harvest point in the diurnal cycle should therefore be representative of the activity across the complete cycle. For two enzymes, an additional assay was performed to provide information about posttranslational regulation (RubisCOin and NADPMDHin, see below). Figure 1 shows the locations of the enzymes in metabolism. Many enzymes can be unambiguously assigned to one pathway. When an enzyme operates in two or more pathways or subcellular compartments, it is assigned to the pathway that is associated with the highest activity of the enzyme (see legend to Figure 1).

Genetic Variation of the Traits Analyzed
Average, minimum, and maximum values and the coefficient of variation (CV) were calculated for each parameter and accession (see Supplemental Figure 1A and Supplemental Data Set 1D online). The CV was small for structural components (chlorophyll and protein) and major central metabolites (total amino acids, sucrose, and starch), larger for rosette fresh weight (FW), reducing sugars, and the major organic acids (malate and fumarate) (20 to 30%), and highest for low molecular weight metabolites determined by gas chromatography-mass spectrometry (GC-MS) (average 45%). Some enzyme activities showed low (TPI, PGM, and Rubisco), many showed intermediate, and some showed higher (GK, FK, G6PDH, INV, NAD-ICDH, and NADP-ICDH) CV values, although not as high as the CV for some of the small metabolites. These results resemble those in (Cross et al., 2006) but cover many more accessions and metabolic parameters.
Due to the large number of plants assayed and the parameters measured, the whole set of accessions was partitioned into eight subsets (called experiments). Each subset of accessions was grown in the same conditions in growth cabinets. Each accession was analyzed in two to four different experiments, and analyses of sugars and FW in the reference accession Columbia-0 (Col-0) were included in all experiments. Two-way analysis of variance (ANOVA) was performed to estimate how much of the variation is determined by the genotype (G) or experimental (E) fluctuations. The F-statistic for the genotype effect and the variances explained by the genotype are given in Supplemental Data Set 1D online. For almost all traits, the genotype effect was highly significant. Due to missing data, a proper model could not be fitted for three traits (salicylate, guanidine, and nicotinate; see Supplemental Data Set 1D online).

Absolute Enzyme Activities
We first asked if enzymes in a given pathway have similar maximum activities and if these differ from those of enzymes in other pathways. The heat map in Figure 1 displays the average activity for each enzyme across all the accessions. Calvin-Benson cycle enzymes have high activities (between 6727 and 15,6000 nmol/min gFW). Especially high activities were found for TPI and PGK, while the other activities were in the range 6727 to 10,852 nmol/min gFW. Intermediate activities are found for enzymes involved in starch (700 to 1190 nmol/min gFW), sucrose (111 to 4094 nmol/min gFW), and amino acid (1148 to 6236 nmol/min gFW) synthesis. Enzymes involved in the sucrose degradation have low activities (112 to 1223 nmol/ min gFW), as do most dedicated enzymes in glycolysis (PFK, PFP, PK, and PEPC) and the tricarboxylic acid (TCA) cycle (CS, aconitase, NAD-ICDH, and NADP-ICDH). The activities of NADH-GAPDH, NAD-MDH, and fumarase are comparable to those of the Calvin-Benson cycle enzymes, indicating they might have other functions, in addition to catalyzing fluxes in glycolysis and the TCA cycle.

Correlation Matrices
A correlation matrix was generated by performing Spearman rank correlation analysis for all pairs of measured traits across the whole population. This analysis used average values calculated from all raw determinations for a given trait/accession pair, without considering the experiment effect. Spearman's rank correlation coefficients (R s ) and accompanying false discovery rate (FDR)-corrected P values (p BH ; Benjamini-Hochberg) are provided in Supplemental Data Sets 2A and 2B online. The FDR corrected P values are summarized as a heat map in Figure 2A and Supplemental Figure 1B online for the matrix obtained when traits were expressed on a FW and a total protein basis, respectively. The traits are organized into seven classes: biomass, large Rubisco and NADP-GAPDH are specific for the Calvin-Benson cycle, AGP for starch synthesis, cFBP and SPS for sucrose synthesis, PFK, PFP, NADH-GAPDH, PK, and PEPC for glycolysis, G6PDH for the oxidative pentose phosphate (OPP), and INV, SuSy, FK, and GK for sucrose breakdown and the use of hexose sugars. TPI, Ald, and PGK are involved in the Calvin-Benson cycle and glycolysis and TK in the Calvin-Benson cycle and the OPP pathway. TPI, Ald, PGK, and TK are assigned to the Calvin-Benson cycle because the major isoforms of these enzymes are located in the chloroplast stroma (Heldt, 1997) and because fluxes are 10-to 100-fold higher in photosynthesis than in glycolysis (Geiger and Servaites, 1994;Gibon et al., 2004a). UGP is involved in the synthesis and breakdown of sucrose, and PGI and PGM are involved in the synthesis and breakdown of sucrose and starch. The assays distinguished plastidic PGI (pPGI) and cytosolic (cPGI) (Gibon et al., 2004b). UGP was assigned to sucrose synthesis. Several TCA cycle enzymes have additional functions. CS and NAD-and NADP-ICDH are required to generate 2-oxoglutarate, which is the C-acceptor during nitrate and ammonium assimilation. NADH-MDH is involved in the synthesis of malate and, together with NADP-MDH, in metabolite cycles that allow redox equivalents to be exchanged between the NADP system in the plastid and the NAD systems in the cytosol and the mitochondrion (Scheibe, 2004). The maximum catalytic activities of each enzyme, averaged across all accessions, are shown by color coding (see figure for scale). All abbreviations are defined in Supplemental Data Set 1C online. The subscript "in" for Rubisco and NADP-MDH indicates "initial activity." structural components, carbohydrates, organic acids, amino acids, secondary metabolites, and enzyme activities. Derived traits (reducing sugars, total sugars, and total amino acids) and the initial activities of Rubisco and NADP-MDH were excluded from the matrices.
The matrices display five general features. First, there are many more positive than negative correlations. On a FW basis, there were 361 positive and eight negative correlations (from a total of 5356 trait pairs) at a significance threshold of p BH < 0.001 (Table 1). When traits were expressed on a protein basis, the (A) Correlation matrices based on Spearman correlation coefficients between structural components, metabolites, and enzyme activities in the accessions. Spearman rank correlation analysis was performed on the mean values for all pairs of measured traits across the whole population. Relationships that are significant at p BH < 0.001, p BH < 0.01, and p BH < 0.05 are indicated by dark, medium, and light shading with positive and negative correlations being distinguished by blue and orange, respectively. Data used for every trait analyzed are based on at least two independent experiments. The original data are given in Supplemental Data Set 1E online. For the display, the traits were organized into seven classes: 1, biomass; 2, structural components; 3, carbohydrates; 4, organic acids; 5, amino acids; 6, miscellaneous metabolites; 7, enzyme activities. The data are expressed on a FW basis. A similar picture emerged when they were expressed on a protein basis (see Supplemental Figure 1B online) (B) Cartographic representation of the primary metabolism network of Arabidopsis. All trait-trait correlations that were significant at p BH < 0.01 were visualized using an algorithm that identifies functional modules within complex networks . Blue and red lines signify positive and negative correlations. corresponding numbers were 161 positive and 0 negative correlations. Second, most of the positive correlations are between traits from the same major trait class (85% at a threshold of p BH < 0.001). Third, all of the negative correlations are between traits from different classes. Fourth, the same pattern emerges, irrespective of whether traits are expressed on a FW ( Figure  2A) or a protein (see Supplemental Figure 1B online) basis. In particular, the high frequency of positive correlations between enzyme activities is retained when the activities are expressed on a protein basis. This excludes a potentially trivial explanation for the coordinated changes of enzyme activities, which would be that they are due to a change in the overall protein content. Fifth, as previously seen (Meyer et al., 2007;Sulpice et al., 2009), many metabolites correlate negatively with biomass, especially when they are expressed on a FW basis. Some enzymes show a positive correlation with biomass, a feature that becomes especially marked when the activities are expressed on a protein basis.

Global Network
A complementary overview of the enzyme-metabolite-biomass network was obtained by extracting all significant trait-trait correlations (p BH < 0.01) and visualizing them using an algorithm that identifies functional modules in complex networks . The resulting network ( Figure 2B) contained two large, strongly connected modules and three smaller, more weakly connected modules. It captures many of the features identified by stepwise analysis of the correlation matrix.
Most of the enzymes are in the largest and most interconnected module. The Calvin-Benson cycle enzymes and many of the enzymes from biosynthetic pathways are tightly interconnected, while others are linked by only a few edges (NR, G6PDH, and enzymes for sucrose degradation). This module contains chlorophyll, but no low molecular weight metabolites. The second large module contains only metabolites. The 26 metabolites in this module include 15 amino acids, 2-oxoglutarate, and some secondary metabolites like sinapate and putrescine, which are synthesized from amino acids. One of the small loosely interconnected modules contained a set of metabolites (several amino acids, organic acids, carbohydrates, ascorbate, and dehydroascorbate) and three enzymes (GLDH, NAD-ICDH, and CS). A second loosely connected module contains five metab-olites (including malate and fumarate) and two enzymes (SPS and NADP-MDHin). A third loosely connected module contains salicylate, starch and sucrose, glucose and fructose, total amino acids, three individual amino acids (Ala, Gly, and homoserine), total protein, and rosette FW. Some metabolites in this module (total amino acids, starch, and sucrose) are connected to chlorophyll and individual enzymes in the large enzyme module.

Coordinated Changes of Enzyme Activities
The following sections explore the correlation matrix in more detail, starting with enzyme activities. Key information is summarized in Table 2. RubisCOin and NADP-MDHin (the subscript "in" indicates "initial activity") were excluded from this analysis because they provide information about enzyme regulation, rather than the amount of enzyme (see below). Ald was excluded because measurements were available for only 58 of the 129 accessions.
As already noted, many enzymes change in a coordinated manner. This is quantified in Table 2. When activities were expressed on a FW basis, each enzyme had an average of 7.0 and 12.3 (from a maximum of 35) significant correlations with other enzymes at p BH < 0.01 and p BH < 0.05, respectively. A similar picture emerged when activities were expressed on a protein basis; each enzyme had an average of 10 and 16 significant correlations with other enzymes at p BH < 0.01 and p BH < 0.05, respectively. As already noted, this excludes the trivial explanation that the coordinated response is due to changes in total protein.

Comparison of Enzyme Activities and Total Protein
We next explored the relationship between the variation in the activities of individual enzymes and the variation in total leaf protein (leftmost column, Table 2). All parameters are expressed on a FW basis. A Spearman coefficient of regression (R s ) of 0.32 was obtained between Rubisco and protein. The very low K cat (3 s 21 ) of Rubisco is compensated for by having a high concentration of Rubisco protein (typically 30 to 40% of total leaf protein; Farquhar et al., 2001;Zhu et al., 2007). It can therefore be expected that changes in Rubisco will result in changes in the total leaf protein content. Chlorophyll binding proteins and other proteins in the thylakoid membranes represent another major component of total leaf protein (Adam et al., 2000). Chlorophyll Derived metabolic traits (reducing sugars, total sugars, and total amino acids) were excluded, as were initial activities of Rubisco and NADP-MDH. a p BH < 0.001. Interestingly, these are key regulatory sites in sucrose synthesis and nitrate assimilation (Stitt et al., 1987;Winter and Huber, 2000). A different picture emerged for glycolysis, the TCA cycle, and sucrose degradation. While R s values with protein of 0.2 to 0.4 were obtained for NAD-GAPDH and PK, many of the enzymes in these pathways had relatively low R s values, and some had very low R s values (PEPC, GLDH, INV, FK, G6PDH, Fumarase, PFP, and Susy), showing that they vary independently of the total leaf protein.

Correlations between Enzymes
We next investigated which enzymes show the highest connectivity to other enzymes. The second column from the left in Table  2 summarizes the number of significant (p BH < 0.01) positive correlations for each individual enzyme. As already noted, at this significance level, on average each enzyme correlates with 7.0 of the other 35 enzymes. The most highly connected enzyme was NAD-MDH, followed by TPI and NADP-MDH. Most enzymes from the Calvin-Benson cycle, starch and sucrose synthesis, nitrogen metabolism, and PEPC, PK, and NADP-ICDH showed above-average connectivity. Poorly connected enzymes included GOGAT, G6PDH, Susy, and INV (no significant correlations), NR and PFP (only one significant correlation), and fumarase (two significant correlations). A qualitatively similar picture emerged at the less stringent filter of P < 0.05 (Table 2, third column from left). Connections between weakly linked enzymes may nevertheless be functionally important. For example, during nitrate assimilation, PEPC is required to synthesize malate as a counteranion (Heldt, 1997). Of the two enzymes that correlate with fumarase, one (NAD-MDH) catalyzes an adjacent reaction in the TCA cycle.
We next investigated whether enzymes are more highly connected to enzymes in the same pathway than to enzymes in other pathways. The initial classification defined 10 pathways (Figure 1; see Supplemental Data Set 1C online), but some were represented by only one to two enzymes. We decreased this to five sets by retaining the pathway sets for the Calvin-Benson cycle, sucrose biosynthesis, and sucrose breakdown, extending nitrogen assimilation and amino acid metabolism to include glycerate kinase, and combining G6PDH, glycolysis, and the TCA cycle as respiration (see Supplemental Data Set 1C online).
We calculated the mean values of R s for pairwise correlations between enzymes within each pathway and between enzymes in one pathway and each of the other pathways ( Figure 3). Calvin-Benson cycle enzymes showed the most highly coordinated changes (i.e., accessions that had high activity of one Calvin-Benson cycle enzyme tended to have high activities of the other Calvin-Benson cycle enzymes). Other pathways showed a less coordinated within-pathway response. Indeed, enzymes assigned to sucrose synthesis, amino acid metabolism, and respiration correlated as well (sucrose synthesis and respiration) or better (amino acid metabolism) with Calvin-Benson cycle enzymes than with enzymes from their own pathway (  Information about the enzymes and their pathway assignments is given in Supplemental Data Set 1C online. Some pathways (starch synthesis) were excluded from the analysis because too few enzymes were determined. Enzymes from TCA cycle, glycolysis, and OPP were grouped as "respiration." Spearman correlation coefficients (R s ) were calculated for each enzyme-enzyme pair. Average values of these coefficients for enzyme-enzyme pairs belonging to the same pathway or to different pathways were calculated. The values range from 0.32 (dark blue) to 0.08 (light blue). other (p BH = 3.48E-05) but change independently of INV and SuSy (see Supplemental Data Set 2A online).
Some coordinated responses span pathways. Supplemental Figure 2 online shows two examples. One is a correlation between NADP-MDH, NAD-MDH, NADP-ICDH, TPI, and NAD-GAPDH. These five enzymes are in the large enzyme module of Supplemental Figure 1B online. They are involved in metabolite shuttles that allow redox transfer between different intracellular compartments (see Discussion). A second example is CS, NAD-ICDH, and GLDH. These three enzymes grouped in one of the small modules in Supplemental Figure 1B online. They are involved in organic acid metabolism: CS and NAD-ICDH in the TCA cycle and GLDH in amino acid catabolism, where it generates 2-oxoglutarate.
The average R s values for the various pathways are <0.5 ( Figure 3). This could still allow marked divergence between accessions with respect to the relative levels of enzymes in a pathway. To explore how large this variation is, we calculated the activity ratio for each enzyme-enzyme pair for each pathway subset (Calvin-Benson cycle, sucrose synthesis, nitrogen metabolism, sucrose degradation, and respiration), in each accession, combined the values, and depicted them as a frequency plot (  The enzymes were assigned to five subsets: Calvin-Benson cycle enzymes, sucrose synthesis, nitrogen metabolism, enzymes of sucrose degradation, and respiration. For each enzyme, the activity in a given accession was normalized to the mean activity of that enzyme in all of the accessions. This allowed activity ratios in the same range to be obtained for each enzyme pair; a value of 1 thus represents the average value across all the accessions. The activity ratio was then calculated for each enzyme-enzyme pair from each set in each accession. The values were then summed and depicted as a frequency plot. Significance of the plots obtained was addressed by shuffling 1000 times the normalized activities and comparing the frequency distribution of the standard deviations of the ratios against the standard deviation of the ratios obtained with the real data (see Supplemental revealed significant coordination in all pathways, except sucrose breakdown where ;25% of the normalized pairwise ratios change >2-fold and some by up to 10-fold ( Figure 4B). Figures 4C and 4D show the frequency distribution for pairwise activity ratios for two selected enzymes. Rubisco shows only small deviations compared other enzymes. For comparison, the frequency distribution is shown for ratios between Rubisco and four Calvin-Benson cycle enzymes (PGK, NADP-GAPDH, TPI, and TK). A similar frequency distribution is obtained for these four enzymes and for the entire set of enzymes. This emphasizes that Rubisco correlates not only with Calvin-Benson cycle enzymes but also with many other enzymes in central metabolism. NR shows a larger spread of ratios, with almost 20% of the values lying <0.7 or >1.4. The spread of ratios was nevertheless smaller than in shuffled data sets (see Supplemental Figure 3 online).

Correlations between Metabolites
Metabolites are less connected to each other than are enzymes. From a maximum possible number of 61, metabolites have on average 4.4 significant (p BH < 0.01) correlations to other metabolites (Table 1). The lower connectivity is not due to noise. Genetic variance actually explains a larger proportion of the total variation in metabolite levels than in enzyme activities (see Supplemental Data Set 1D online).
Metabolites correlate most frequently with metabolites from the same sector of metabolism. This is most marked for amino acids, with on average 5.5 correlations per amino acid (p BH < 0.01), from a maximum possible number of 23 (Table 1). There are strong correlations between central amino acids like Gln, Glu, Asp, and Asn (see Supplemental Data Set 2A online) and some minor amino acids (Arg and Met). There is a strong correlation between the three aliphatic amino acids (Leu, Ileu, and Val), between b-Ala and 4-aminobutyrate, and between the three aromatic amino acids. Gln, Glu, Asp, and several minor amino acids correlate with 2-oxoglutarate (Table 3), which is the C-acceptor in the GOGAT pathway (see Discussion), and with putrescine and sinapate. These metabolites are in the same module in Figure 2B. Interestingly, Gly, which is formed during photorespiration, is poorly connected with other amino acids. Other examples of metabolite-metabolite correlations include sucrose and starch (R s = +0.42; P BH = 4.88E-05), glucose and fructose (R s = +0.59; P = 1.15E-10), raffinose and its precursors (galactinol and myoinositol), ascorbate and dehydroascorbate (p BH = 1.1E-04), and between the latter and threonic acid, which is formed during breakdown of ascorbate (Debolt et al., 2007).

Relationships between Enzyme Activities and Metabolite Levels
There are very few correlations between enzyme activities and metabolites. The matrix contains 62 metabolites and 37 enzymes, allowing 2294 pair-pair correlations between enzymes and metabolites. There are only three and one (p BH < 0.001) significant correlations when the traits are compared on a FW (Table 1) and protein (see Supplemental Data Set 2B online) basis, respectively. Figure 5 lists the enzymes and metabolites with the largest number of cross-trait class correlations. Among the enzymes, PGM and SPS showed the largest number of correlations with metabolites. Rubisco was correlated with three major C-containing products of photosynthesis (sucrose, starch, and malate). Among the metabolites, myo-inositol, b-Ala, Lys, and mannose had the largest number of significant correlations to enzymes.
A trivial explanation for the lack of connectivity would be that many of the enzymes and metabolites come from different pathways. To test this possibility, we generated a series of smaller within-pathway matrixes, which contained enzymes and metabolites from the same metabolic sector (starch synthesis, sucrose metabolism, hexose metabolism, nitrogen assimilation, amino transferase reactions, aromatic amino acid biosynthesis, malate, and fumarate metabolism). We also searched for correlations between Calvin-Benson cycle enzymes and the major carbohydrate products of CO 2 fixation (starch and sugars). Table 4 summarizes significant R s values between enzymes and metabolites in these small matrices, using a relaxed P value (p BH < 0.01). The frequency of correlations was no higher than in the entire data matrix. Another explanation for the lack of connectivity between metabolites and single enzymes would be that the level of a metabolite may be determined by several enzymes. Multivariate analysis like partial least squares (PLS) regression can be used to predict one trait from a linear combination of other traits (termed predictors) (Wold, 1975;Barker and Rayens, 2003). PLS identifies the combination of the original predictors (e.g., metabolites and enzymes) that has maximum covariance with the trait of interest. We tested, for each metabolite, whether PLS regression against the entire set of enzyme activities would provide a better fit than regression against single enzymes. We did not find an improvement in any case.
It is also possible that groups of enzymes interact in a nonlinear manner to determine the level of a given metabolite. To investigate this possibility, we performed support vector machine regression using a nonlinear kernel (Fan et al., 2005). We found only one metabolite whose level was significantly predicted by a group of enzymes (glutamate; R = 0.42, P value corrected = 3E-05). Multiple repetitions of the analysis, leaving out one enzyme at a time, indicated that GluDH was the only enzyme with a major effect on Glu levels.

Relationships between Enzyme Activation State and Metabolite Levels
Optimized in vitro measurements provide information about the maximum catalytic capacity. In vivo activity also depends on the concentrations of substrates and regulatory effectors and, in some cases, posttranslational modification. Two of the enzymes in our platform are subject to a posttranslational modification that can be monitored by appropriate sample handling and assay procedures. Rubisco is posttranslationally activated by carbamylation of Lys-201. The inactive and active form can be distinguished by assaying activity immediately or after incubation with high bicarbonate concentrations in alkaline conditions to convert the decarbamylated into the carbamylated form (Portis, 2003). NADP-MDH is reversibly reduced and activated by thioredoxin-m. Assay of maximum activity in vitro requires pretreatment with DTT, whereas rapid assay in the absence of DTT detects only the reduced form (Scheibe, 2004). These two enzymes were assayed without pretreatment to monitor the amount of posttranslationally activated enzyme (RubisCOin and NADP-MDHin). This revealed further biologically relevant correlations: NADP-MDHin was negatively correlated with malate (R s = 20.38, p BH = 30.005), and RubisCOin correlated positively with NR activity (R s = +0.35, p BH = 0.01) (see Supplemental Data Set 2A online).

Relationships between Enzyme Activities and Rosette Biomass
Several enzyme activities correlate weakly with rosette FW, especially when they are expressed on a protein basis (see Supplemental Figure 1B online). Using an analogous approach to that taken previously for metabolites (Meyer et al., 2007;Sulpice et al., 2009), we investigated whether a stronger regression can Several sets of enzymes and metabolites were identified in which the metabolites are the immediate or near-immediate substrates or products of enzymes: starch synthesis (pPGI, AGP, and starch), sucrose metabolism (cPGI, UGP, SPS, SuSy, INV, and sucrose), hexose metabolism (INV, GK, FK, sucrose, glucose, and fructose), nitrogen assimilation (NR, GS, GOGAT, NAD-IDH, NADP-ICDH and Gln, and 2-oxoglutarate and Glu), central amino transferase reactions (AspAT, AlaAT and Glu, 2-oxoglutarate, pyruvate, Asp, and Ala), aromatic amino acid synthesis (TK, shikimateDH, shikimate, Phe, Trp, and Tyr), and malate and fumarate metabolism (NAD-MDH, NADP-MDH, fumarase, malate, and fumarate). The six Calvin-Benson cycle enzymes were also compared with starch, sucrose, glucose, and fructose. For each set identified, all significant (p BH < 0.01) correlations were between enzymemetabolite pairs in the trait set. This is compared with the number of correlations that these traits have in the data matrix without the initial activities.
be obtained using the 35 enzyme activities as predictors in a PLS regression on biomass. In contrast with metabolites, this did not improve the regression (R pls = 0.04; P = 0.07). This was so, irrespective of whether the enzyme activities were expressed on a FW or a total protein basis. This lack of additivity is probably a consequence of the highly coordinated changes of enzyme activities. We therefore searched for a way to integrate information about the variance in enzyme activity that did not depend on variation between the activities of individual enzymes. The highly coordinated changes of maximum enzyme activities are likely to be due to coordinated changes in the amount of protein invested in enzymes, rather than parallel changes in their K cat . We used literature values for enzyme specific activities (Table 2) to compute the amount of protein invested in each enzyme in each accession (see Supplemental Data Set 1F online). As shown by Piques et al. (2009), the enzyme amount estimated in this way correlates well with protein abundance measured using mass spectrometry. To provide an overview of how the protein abundance varies between enzymes, we estimated the average value across all the accessions ( Table 2). As expected, Rubisco represents a large proportion of total leaf protein (43.7%). Other enzymes contribute between 0.01 and 3%. Grouped on a pathway basis, Calvin-Benson cycle enzymes contribute almost 50% of the total protein, Calvin-Benson cycle enzymes without Rubisco 5%, sucrose and starch synthesis <1%, respiration 3%, amino acid metabolism 2.6%, and sucrose breakdown 0.2% of total leaf protein.
The estimated protein in all of the enzymes was summed for each accession and expressed as a percentage of the total protein in that accession. This term will be referred to as relative investment of protein in metabolism (RPM). There was a significant positive correlation (R s = 0.33, p BH = 0.0001) between RPM and FW ( Figure 6A). As Rubisco represents the vast majority of the rosette protein, this relationship might be very sensitive to changes in Rubisco. The same calculation was therefore repeated for all enzymes except Rubisco. A similar correlation was retained (R s = 0.36; p BH = 0.00002; Figure 6B). An analogous plot in which the fraction of total rosette protein found in Rubisco was plotted against FW yielded an R s of 0.29 (p BH = 0.0008) (see Supplemental Data Set 2A online). This stable relationship reflects the fact that there is a fairly conserved relationship (R = 0.51) between the fraction of the total protein that is invested in Rubisco and the fraction that is invested in the other 34 enzymes ( Figure 6C).
To identify enzymes with a particularly strong predictive value for RPM, the amount of protein in a given enzyme was regressed on the summed protein in all enzymes. An analogous calculation was also performed in which Rubisco was omitted from the summed enzyme protein. Especially high R s values were obtained for NADP-GAPDH, TPI, TK, NAD-GAPDH, and AlaAT (Table 2, rightmost columns). As already noted, total leaf protein shows an inverse relationship to rosette FW ( Figure 6D; see also Sulpice et al., 2009). Consequently, accessions with a high RPM tend to have low total leaf protein ( Figure 6E). Sulpice et al. (2009) used PLS regression to show that the set of 62 metabolites analyzed in this study have predictive power for starch, as well as rosette FW, and that the VIP values (variance importance in the prediction; this gives the weighting of individual metabolites for the prediction of biomass) of the predictor metabolites were similar in both regressions. This implies that starch integrates metabolic status and that the regulatory network that determines starch levels contributes to the regulation of biomass. We investigated whether the prediction of biomass provided by RPM is additive to that provided by starch ( Figure  6F). This was done by performing multiple linear regression with cross validation. In this data set, the R 2 s of starch with FW was 0.21, which is slightly lower than in the smaller data set in Sulpice et al. (2009). The R 2 s of RPM with biomass was 0.11, and the combined R 2 s provided by starch and RPM was 0.29. An F-test was applied to evaluate the statistical significance of the increase of explained variance when RPM is added to starch as predictor. The corresponding P value of 6 3 10 25 demonstrates that this increase is highly significant. Regression analysis indicated that starch and RPM are independent parameters The display shows selected examples of significant Spearman correlation coefficients between enzyme activities and metabolites. The number indicates the R value and the color shading the P value (light, medium, and dark shading represent p BH <0.05, <0.01, and <0.001, respectively). A full list of all correlation coefficients is provided in Supplemental Data Set 2A online. Blue colors are for positive correlations. The yellow shading is used for a negative correlation with p BH < 0.01. (R 2 s = 0.003; Figure 6F). This was further tested by including an interaction term between RPM and starch. This did not improve the prediction of biomass but instead resulted in a lower value of the adjusted R 2 , which was not significant in an ANOVA test. Evaluation of Akaike's Information criterion for the two alternative models also showed that the model without the interaction term was to be preferred.
These results imply that almost one-third of the variance in biomass in this large population of accessions is related to these two integrative metabolic parameters. Their contribution is visualized in Figure 6G. For each accession, the starch content is plotted against the RPM value. The accessions were divided into six classes based on their biomass. These classes are identified by color coding, with increasing darkness signifying increasing biomass. Some of the smallest accessions (Lov5, Bla11, Nok2, and Tamm2 marked with an asterisk in the diagram) have a low value of RPM and a high starch content. However, when accessions are compared in the moderately sized or large classes, low RPM tends to be accompanied by high starch and vice versa. The largest accessions are characterized by relatively low starch and high RPM values but do not have extreme values for either parameter. In several cases, outliers for the relationship between biomass and one parameter can be explained by an extreme value for the other parameter. For example, Pyl1 and Van0, (marked as "x" in the figure) are small accessions with a high starch content but have a high RPM, whereas Be0 and Nok1 (marked with "+" in the figure) are small accessions with a low RPM but a low starch content.
Van0 contains a mutation in the ERECTA locus (van Zanten et al., 2009). Mining the currently available genome sequence data for the accessions used in this study (Est-1, Fei-0, Bay-1, Bur-0, and Cvi-1) did not reveal any other likely erecta nonsilent alleles.

DISCUSSION
Metabolism is usually studied by perturbing the activities of single enzymes. However, metabolic networks are generated by interactions between enzymes. Arabidopsis accessions have allelic diversity for hundreds or thousands of genes, allowing a systems-level study of how a metabolic network responds to simultaneous changes of many enzyme activities. This response was analyzed by measuring a large set of enzyme activities, metabolites, and biomass in 129 accessions. The resulting data matrix reveals a very high frequency of positive correlations between enzyme activities, a moderate frequency of correlations between related metabolites, and remarkably few correlations between enzyme activities and metabolite levels. It provides insights into the nature, and consequences, of genetic variation at these two levels of metabolic function and its relationship with biomass formation.

Absolute Enzyme Activities Differ between Pathways in a Manner That Broadly Reflects Pathway Stoichiometry
Enzymes from the same pathway have more closely related absolute activities than do enzymes from different pathways (Figure 1). The range of activities measured here broadly reflects differences in fluxes through the different pathways. Enzyme activities in the Calvin-Benson cycle that we measured were, on average, ;20-fold higher than those in anabolic pathways like sucrose and starch synthesis, nitrogen assimilation, and amino acid synthesis. Based on known pathway stoichiometry (Heldt, 1997), it can be calculated that synthesis of one molecule of sucrose requires the net fixation of 12 molecules of CO 2 . Depending on the position of the enzyme in the Calvin-Benson cycle, 12 to 24 catalytic cycles will be needed to regenerate 12 molecules of the acceptor Ru1,5bisP and allow fixation of 12 molecules of CO 2 . This number will be 50 to 60% higher in ambient conditions due to photorespiration (Pettersson and Ryde-Pettersson, 1988;Poolman et al., 2001;Reumann and Weber, 2006;Zhu et al., 2007). The activities of most enzymes for sucrose degradation and respiration were even lower. This reflects the fact that the rate of respiration is typically 10-to 100-fold lower than the rate of photosynthesis (Heldt, 1997;Ivanova et al., 2008). The qualitative The amount of protein invested in a given enzyme was calculated from the measured activity using literature values for the specific activity (see Table 2 and Supplemental Data Set 1F online). This was summed for all enzymes in a given accession and divided by the total rosette protein in that accession to calculate the relative investment of protein in metabolism (RPM). match between absolute enzyme activities and the fluxes that they carry indicates that there has been selection to optimize the efficiency of allocation of C and nitrogen between different enzymes and pathways.
For some respiratory enzymes (e.g., NAD-MDH and fumarase), we measured activities similar to those of the Calvin-Benson cycle enzymes. In addition to their function in the TCA cycle, these enzymes have an additional role in catalyzing the synthesis of organic acids. Malate accumulates to high levels in the vacuole of many plants (Heldt, 1997), and fumarate also accumulates in Arabidopsis (Chia et al., 2000). Organic acids are synthesized as a counteranion when nitrate is assimilated and may also have an osmotic function and provide a temporary carbon store (Zell et al., 2010). This may explain why the activities of NAD-MDH and fumarase are much higher than those of other respiratory enzymes.

Coordinated Changes of Enzymes That Are Required for Photosynthesis
Enzymes from a given pathway show coordinated changes of activity in this large population of Arabidopsis accessions. Examples include enzymes in the Calvin-Benson cycle and, to a lesser extent, enzymes for starch and sucrose synthesis and amino acid metabolism. These coordinated changes are retained when enzyme activities are examined in relation to total rosette protein content. They result in changes in the total rosette protein because the enzymes in central metabolism, especially Rubisco, represent a considerable part of the total rosette protein.
The coordinated nature of the variation in enzyme activities in this panel of accessions has two important functional implications. First, a coordinated change in the activities of several enzymes is an effective strategy to alter pathway flux (Kacser and Burns, 1973;Kacser and Acerenza, 1993;Fell and Thomas, 1995). Second, the changes in the relative levels of individual enzymes in this panel of accessions are so small that they are unlikely to have a detrimental effect on efficiency with which the other enzymes are used. For example, there are almost no instances where the activity of one Calvin-Benson cycle enzyme changes >2-fold relative to the activity of another Calvin-Benson cycle enzyme. As discussed in the Introduction, a 2-fold decrease in the activity of individual enzymes in mutants and transgenic plants leads to only minor changes in flux for some enzymes and has no effect for other enzymes (see also Stitt et al., 2010b). Zhu et al. (2007) developed a model of the Calvin-Benson cycle, photorespiration and photosynthetic end product synthesis, and used it to predict that a small reallocation of protein toward Rubisco, Ald, SBPase, and plastid FBPase and away from other Calvin-Benson cycle enzymes could potentially increase the rate of photosynthesis. We found no evidence for more variation in the ratio between Rubisco and PGK, NADP-GAPDH; TPI and TK than would be expected on the basis of the variation in the total data set.
In some cases, enzymes in different pathways show coordinated changes of activity. For example, enzymes from the pathways of sucrose and starch synthesis and amino acid synthesis correlate with Calvin-Benson cycle enzymes. Although nominally classified as separate pathways, they are actually mutually interdependent because the synthesis of sucrose, starch, and amino acids depends on the supply of triose phosphates from the Calvin-Benson cycle. A recent study of an RIL population found that AGP and pPGI correlated with Rubisco (R > 0.3), whereas enzymes for sucrose degradation and respiration did not (R < 0.17) (Keurentjes et al., 2008). Another example of a coordinated response that spans several nominally different pathways is provided by NADP-MDH, NAD-MDH, NADP-ICDH, TPI, and NAD-GAPDH. In addition to their roles in the Calvin-Benson cycle, glycolysis, and the TCA cycle, these enzymes catalyze metabolite shuttles that transfer ATP and redox groups between subcellular compartments. During photosynthesis, the light reactions generate NADPH and ATP in the chloroplast stroma. While much of this is used in the chloroplast, some is transferred to the remainder of the cell. Mature chloroplasts have no mechanism for direct export of NAD(P)H and only a limited capacity for ATP export (Heldt, 1997). Instead, ATP and reducing groups are exported via a metabolite shuttle involving the chloroplastic NADP-GAPD, the triose phosphate translocator, and the cytosolic NAD-GAPDH, and excess reducing equivalents are exported by the malate valve involving the plastidic NADP-MDH and extraplastidic NAD-MDH (Scheibe, 2004).
Enzymes in other pathways, especially sucrose degradation, show divergent responses. Plants possess alternative and potentially redundant pathways for sucrose degradation via SuSy and FK or via INV, GK, and FK. No evidence was found for a correlation between enzymes that are required in one or the other of these pathways. These enzymes, in particular INV, also varied in an uncoordinated manner in a Cvi 3 Ler RIL population (Keurentjes et al., 2008). This implies that Arabidopsis tolerates large changes in the relative levels of enzymes in C breakdown pathways. NR provides another interesting example of an enzyme whose maximum activity varies independently of other enzymes. NR is a key site for the regulation of nitrate assimilation. In addition to transcriptional and translational regulation (Nussaume et al., 1995;Kaiser and Huber, 2001;MacKintosh and Meek, 2001;Lea et al., 2006), NR is subject to posttranslational regulation involving phosphorylation followed by the Mg 2+ -dependent binding of a 14-3-3-protein (Nussaume et al., 1995;Kaiser and Huber, 2001;MacKintosh and Meek, 2001;Lea et al., 2006). One possible explanation for the divergent responses may be that NR activity is mainly regulated by posttranslational mechanisms. Nitrogen metabolism in tobacco (Nicotiana tabacum) was not markedly affected by constitutive overexpression of native NIA (encoding NR) but was strongly modified by introduction of a modified and constitutively active form of the enzyme (Lea et al., 2006).

Coordinated Changes of Sets of Related Metabolites
Metabolites were moderately correlated with each other at a global level. This resembles previous studies with Arabidopsis rosettes (Meyer et al., 2007;Sulpice et al., 2009) and tomato fruit (Schauer et al., , 2008. There were strong correlations between metabolites in specific sectors of metabolism. This included the amino acids, as seen previously (Galili and Hofgen, 2002;Foyer et al., 2003;Lu et al., 2008). There was also a strong correlation between total amino acids and sucrose and starch, the two main carbohydrates formed during photosynthesis. Similar correlations have been reported in physiological studies of single or near-isogenic genotypes (see Introduction). However, starch and sucrose did not correlate with individual amino acids. Instead, 2-oxoglutarate correlated with Gln, Glu, Asp, and several minor amino acids, including Arg and Phe. 2-Oxoglutarate is the immediate C-acceptor during nitrogen assimilation in plants, where it reacts with Gln to form two molecules of Glu in the reaction catalyzed by GOGAT. These results indicate genetic conservation of a regulatory network linking 2-oxoglutarate and amino acid biosynthesis. Physiological studies in tobacco and Arabidopsis have shown that the levels of 2-oxoglutarate and Glu are often correlated (Muller et al., 2001;Fritz et al., 2006), and there is growing evidence that Glu plays a key role in the regulation of nitrogen metabolism (Fritz et al., 2006;Forde and Lea, 2007).

Metabolites Change Independently of Maximal Enzyme Activities
There were very few correlations between enzyme activities and metabolites. We also failed to uncover more links when we used multivariate statistics to search for combinations of enzymes that might determine the level of a metabolite, or permitted nonlinear interactions. Several explanations can be advanced for this lack of connectivity between enzyme activities and metabolites.
One explanation is that enzymes catalyze fluxes and that there is no simple relationship between flux and the concentrations of metabolic intermediates. The coordinated changes of enzyme activities that occur in this panel of accessions may, paradoxically, result in there being even fewer correlations of enzyme activities with metabolic intermediates. As pointed out (Kacser and Acerenza, 1993), a coordinated increase of enzyme activities will increase flux while leaving the levels of the metabolic intermediates essentially unaltered. This does not account for the absence of correlations between enzymes and metabolites that are pathway products, like starch, sucrose, many of the amino acids, and organic acids that accumulate to high levels in the vacuole, like malate and fumarate. However, the levels of pathway products may still represent a balance between the rates of synthesis and utilization. For example, sucrose is exported from the leaf, and amino acids are exported or used for protein synthesis in the leaf. In one case, our data set does allow a direct assessment of flux. Starch is synthesized in the light and almost completely remobilized at night in the conditions used in our experiment (Gibon et al., 2004a;Usadel et al., 2008), so the level at the end of the light period therefore reflects the rate of synthesis in the preceding light period. The starch content did not correlate with the activity of two enzymes from the pathway of starch synthesis, AGP or pPGI (R s = -0.04 and 0.17, respectively).
A second explanation is that metabolite levels are not determined by maximum enzyme activities. When the maximum activity of an enzyme is decreased, there is usually no effect on flux until a threshold is reached (Kacser and Burns, 1973;Fell and Thomas, 1995;Stitt et al., 2010b). As discussed in the introduction, for four (NADP-GAPDH, PGK, TPI, and TK) of the six Calvin-Benson cycle enzymes in our data set, it has been empirically determined that a 2-fold decrease of the maximum activity has no detectable effect on the rate of photosynthesis (Stitt and Schulze, 1994;Stitt and Sonnewald, 1995). Our measurements now show that the relative activities of these enzymes vary by <2fold in this large population of accessions. This explains why they do not correlate with metabolites that are produced by photosynthesis. Rubisco and Ald do have measureable flux control coefficients for photosynthesis in ambient conditions (i.e., a relatively small decrease in their maximum activity leads to a detectable inhibition of CO 2 fixation). Rubisco activity correlated with starch and sucrose, which are the two main end-products of photosynthesis, whereas Ald showed little variation in its activity in this panel of accessions. The correlation matrix between maximum Calvin-Benson cycle enzyme activities and metabolites in this panel of accessions is therefore consistent with the results of earlier studies, which used reverse genetics to alter the maximum activity of individual enzymes. Crucially, the matrix reveals which enzymes actually show changes of their maximum activity in different Arabidopsis accessions and could therefore contribute to the genetic control of metabolism. The coordinated changes of maximum enzyme activities in the Calvin-Benson cycle and related pathways that occur in this panel of accessions will provide a very effective means to change fluxes through central photosynthetic metabolism.
In vivo enzyme activity is usually lower than the maximum activity. There are various reasons for this (Rolleston, 1972;Newsholme and Start, 1973;Junker et al., 2007). One is that many reactions are close to their thermodynamic equilibrium in vivo. In this case, the net flux is the difference between the rates of the forward and the reverse reactions and is much smaller than total rate of catalysis. The second is that in vivo activity is typically restricted by inhibitors, low levels of substrates and activators, and/or posttranslational regulation. In the terminology of Newsholme and Start (1973), coarse regulation refers to changes in the maximal activity brought about by expression and protein turnover, and fine regulation refers to modulation of the activity by substrate availability, levels of activators and inhibitors, and posttranslational activation. Fine regulation makes a far larger contribution than does gene expression to the regulation of glycolysis when yeast is grown in conditions that alter the ATP yield (Daran-Lapujade et al., 2007). This implies that additional information about fine regulation will be required to link changes in maximum enzyme activity to the responses of metabolites.
It is difficult to assess the impact of fine regulation when it is caused by changes in substrate saturation and the levels of metabolite effectors because this requires detailed information about the kinetic properties of an enzyme and the in vivo levels of all its major effectors. By measuring the posttranslational regulation of Rubisco and NADP-MDH, we were able to uncover two biologically relevant correlations that were not apparent from measurements of maximum enzyme activities.
The lack of connectivity between enzymes and metabolites highlights the complexity of the interactions between different enzymes in pathways in central metabolism and the importance of regulation of enzyme activities by metabolite ligands and posttranslational modification (fine regulation). Further integration of global data on metabolism will require determinations of fluxes, the acquisition and integration of information concerning allosteric regulators and posttranslational regulation, and the analysis of these results in the context of quantitative metabolic models (Pettersson and Ryde-Pettersson, 1988;Poolman et al., 2001;Uys et al., 2007;Zhu et al., 2007).

Biomass Correlates with the Relative Investment of Protein in Enzymes
Our study provides a large matrix of over 100 metabolic traits, which can be analyzed to identify metabolic traits that are linked with increased biomass. Multivariate statistics has already been applied to predict rosette biomass from a large set of metabolites (Meyer et al., 2007) and to show that starch is an integrative biomarker that captures much of the information present in the metabolite profile and has a highly significant negative correlation with biomass . Starch accumulates in the light period and is degraded to support metabolism and growth at night (Smith and Stitt, 2007). When Col-0 is grown in different photoperiods, there is a strong correlation (R > 0.99) between the relative growth rate and the rate of starch degradation . In this study, starch was measured at the end of the light period. It has been shown previously that starch is almost completely degraded by the end of the night in a subset of 20 accessions (Cross et al., 2006). The amount of starch accumulated at the end of the light period therefore provides an indicator for the amount of C available for metabolism and growth at night. The negative correlation between starch and biomass indicates that accessions that produce a large biomass are able to use their C more efficiently for growth. This biomarker therefore presumably reflects the efficiency with which C is allocated for biomass production.
Many individual enzyme activities correlate weakly and positively with growth, especially when they are expressed on a total protein basis. However, multivariate analysis on the 35 enzyme activities did not provide a prediction of biomass. The reason is probably that little additional information can be extracted by a combined analysis because the enzyme activities are already strongly correlated with each other. This prompted us to use the measured enzyme activities to compute the fraction of total rosette protein that is invested in the enzymatic machinery for central metabolism (RPM). We found a significant positive correlation between RPM and rosette biomass. We interpret this parameter as measure of the efficiency with which protein is invested for the purpose of carbon acquisition. Many of the most strongly correlating enzymes are involved in the Calvin-Benson cycle or related pathways. This relationship did not depend on any single enzyme. Indeed, our results show that information about one or a small number of enzymes suffice to generate a proxy value for this term. The robustness is presumably a consequence of the coordinated nature of the responses of enzyme activities. The importance of optimizing investment in the photosynthetic machinery is highlighted by the fact that half or more of the total protein in leaves is invested in the photosynthetic machinery (Farquhar et al., 2001;Zhu et al., 2007).
The prediction of biomass provided by the relative investment of protein in metabolism (RPM) is independent of the prediction provided by starch, and the summed R 2 for these two integrative parameters accounts for almost one-third of the total variance in rosette biomass in this large population of Arabidopsis accessions. Formally, this additivity is a consequence of the lack of connectivity between enzyme activities and metabolite levels. In practical terms, this means that a relatively small number of robotized analytic measurements can be used to identify two major metabolic determinants of biomass. Functionally, it indicates that investment in photosynthetic machinery and optimization of carbon use are largely separate processes. While both are likely to be subject to multigenic regulation, our results indicate that these may be largely nonoverlapping networks. This may allow these two parameters to be used in the future to identify complementary genotypes that score highly for one and weakly for the other to investigate how they contribute to productivity in segregating populations.
Finally, our approach is based on the idea that the variation for metabolic traits in accessions is generated by large numbers of small perturbations. It is possible that part of the variance is generated by a small number of major polymorphisms that have a large impact on one or many metabolic traits. For example, ERECTA is a pleiotropic regulator of developmental and physiological processes, as well as a modulator of responses to environmental stimuli (van Zanten et al., 2009). In an analysis of a Ler 3 Cvi RIL population, Keurentjes et al. (2008) noted that the ERECTA locus, which is polymorphic between the parents, colocates with strong QTL for several metabolic traits studies and suggested that ERECTA may be responsible for a subtle simultaneous regulation of primary metabolism in parallel with, or as a consequence of, its effects on development. Our accession panel contains at least two (accessions Ler and Van0) with a polymorphism at the ERECTA locus (van Zanten et al., 2009). As more genome resequencing data become available, it will become possible to use candidate gene association mapping to assess the contribution of such major locus polymorphism to genetic variation in metabolism at a species level.
For the selection of a diverse collection of Arabidopsis accessions, 406 Arabidopsis accessions were analyzed using 115 single nucleotide polymorphism (SNP) markers (Toerjek et al., 2003;Schmid et al., 2005).
Heterozygous genotypes of SNPs were excluded from further analysis (<2%). All 406 accessions as well as 115 SNPs were used in the Mstrat software (Gouesnard et al., 2001) to generate a core collection of accessions with maximal allelic richness. One hundred core collections were generated independently, and the core set with the highest Nei index was retained. Selected accessions were genotyped with an additional set of 149 SNP markers, which are a subset of the 289 SNP markers available at http://naturalsystems.uchicago.edu/naturalsystems/lab/ index_files/page0007.html. Genotyping was done at Sequenom. Finally, a cutoff of >15 accessions in the minor allele class was used to exclude 81 markers where a highly asymmetric distribution would lead to poor statistical resolution.

Growth Conditions and Experimental Design
Growth conditions were exactly as previously described by Cross et al. (2006). Briefly, seeds were germinated and grown for the first 7 d with a daylength of 16 h, temperature 68C at night and 208C during day, humidity 75%, and luminosity 145 mmol m 22 s 21 . After 7 d, seedlings were transferred to a phytotron. Growth was continued in an 8-h-light/16-hdark regime at temperatures and humidities of 168C and 75% at night and of 208C and 60% during the day. Illumination was 145 mmol m 22 s 21 . At the age of 2 weeks, plants of average sizes were transferred to pots of 6 cm in diameter (five plants per pot). After 1 further week in the short-day conditions outlined above, plants were moved to a controlled small growth chamber for 2 more weeks. Daylength was then 8 h, temperature a constant 208C, and illumination an average 125 mmol m 22 s 21 . Plants were watered daily. Harvests of 15 plant rosettes (five samples containing each three plants) were performed at the end of the light period. Each sample typically contained three rosettes, equivalent to ;500 mg FW, depending on the accessions. The entire sample was powdered under liquid nitrogen and stored at -808C until its use.
Single accessions were grown in at least two independent experiments and in total eight independent experiments were performed, each including between 20 and 108 accessions. Within each experiment, the position of the pots containing individual accessions was randomized.

GC Analysis
Metabolite extraction for GC-MS was performed on the exact same samples as used for enzymes and metabolites determined by spectrophotometric methods as described previously . Fifty milligrams of Arabidopsis shoots were homogenized using a ball mill precooled with liquid nitrogen. Derivatization and GC-MS analysis were performed as described previously (Lisec et al., 2006).

ANOVA
Statistical analyses were performed with SAS-JMP (SAS Institute). A two-way factorial ANOVA was used to partition the variance into genetic and experimental variance separately for each trait using the following model: where Y ijk are the experimental responses, m is the general mean, A i is the effect of the ith accession genotype, E j is the effect of the jth experiment, and e jik is the error of Y ijk .
The explained genetic variances of a genotype main effect (h 2 A) and of an experiment effect h 2 (E) were calculated as follows:

SS E SS Total
where SS A , SS E , and SS Total are the sum of squares of the factor accession, experiment, and the total sum of squares, respectively, which were obtained from the ANOVA procedure.

Partial F-Test for Model Selection and Coefficient of Variation
The F statistic we used is defined by F ¼ RSSðb k Þ 2 RSSðb kþ1 Þ= RSSðb kþ1Þ Þ=ðn 2 k 2 2Þ ; with RSS denoting the residual sum of squares for the models expressed by the coefficient vectors b k and b k+1 , and k and k+1 representing the number of variables already selected and n the number of samples.
The CV was defined as the ratio of the standard deviation to the mean of all mean values obtained for a trait/accession pair.

Support Vector Machines
To evaluate nonlinear relationships between enzyme activities and metabolites, support vector machines regression was applied using the R function with a radial kernel "svm" (Chang and Lin, 2001). To detect the important predictors from p input variables, the svm prediction in crossvalidation was p times repeated each time leaving out one predictor. The predictor for which the elimination yields the lowest correlation between predicted and actual response in cross-validation is then regarded as the most important one.

PLS Regression and Cross-Validation
PLS identifies the combinations of the original predictor variables with maximum covariance with the response (Wold, 1975;Barker and Rayens, 2003). These orthogonal combinations replace the original data matrix and are used in a multivariate ordinary least squares regression to predict the response. The optimal number of components is determined by the maximum proportion of explained variance obtained in 5-fold crossvalidation. The observations are divided into five subsets, a training set (four subsets) is used to build a model that is applied to predict the response of the remaining subset, and this procedure is repeated five times to estimate the response for all observations. Cross-validation was also applied each time to the training set. The estimated vector is correlated with the measured response to obtain a measure of the predictive power of the predictor variables.
The weight of a predictor j in the linear combination resulting in PLS component i is denoted as w ij . The VIP of each predictor j gives an estimate of the importance of that predictor for the PLS prediction using the h most important orthogonal components and is calculated as the sum of the w ij (i=1,… h) multiplied by the correlation of PLS component i with the response (Chong and Jun, 2005).

Significance of Coordinated Changes between Enzyme Activities
For each enzyme, the activity in a given accession was normalized on the mean activity of that enzyme in all of the accessions. This allowed activity ratios in the same range to be obtained for each enzyme pair. The activity ratios for each enzyme-enzyme pair belonging to the pathways tested were calculated in each accession. The values obtained for all accessions were then depicted as a frequency plot. To test for significance of the plots obtained, the normalized activities were shuffled 1000 times for the accessions, ratios calculated and combined, and their standard deviation calculated. The frequency distribution of the standard deviations obtained from 1000 such shufflings was tested against the standard deviation of the real values (see Supplemental Figure 3 online).

Cartographic Representation of the Network
A matrix of correlation between all trait pairs was generated. Significance levels were set at p BH < 0.01. The 412 significant pairs (>5886) that resulted were considered as a network in which a vertex corresponds to a trait and a link between two vertices corresponds to significant correlations between these two traits. This network was then subjected to the cartography algorithm . Essentially, this algorithm divides the network into modules or groups of vertices that are more connected between themselves than to nodes from other modules, yielding a cartographic representation of a complex network. In implementing this algorithm, negative correlations were considered to be equal to positive correlations.

Supplemental Data
The following materials are available in the online version of this article.