|
|
||||||||
|
First published online June 13, 2008; 10.1105/tpc.108.060541 The Plant Cell 20:1603-1622 (2008) © 2008 American Society of Plant Biologists Arabidopsis Nuclear-Encoded Plastid Transit Peptides Contain Multiple Sequence Subgroups with Distinctive Chloroplast-Targeting Sequence Motifs[W]
a Laboratory of Cellular Systems Biology, Division of Molecular and Life Sciences, POSTECH, Pohang 790-784, Korea 2 Address correspondence to sukim{at}postech.ac.kr or ihhwang{at}postech.ac.kr.
The N-terminal transit peptides of nuclear-encoded plastid proteins are necessary and sufficient for their import into plastids, but the information encoded by these transit peptides remains elusive, as they have a high sequence diversity and lack consensus sequences or common sequence motifs. Here, we investigated the sequence information contained in transit peptides. Hierarchical clustering on transit peptides of 208 plastid proteins showed that the transit peptide sequences are grouped to multiple sequence subgroups. We selected representative proteins from seven of these multiple subgroups and confirmed that their transit peptide sequences are highly dissimilar. Protein import experiments revealed that each protein contained transit peptide–specific sequence motifs critical for protein import into chloroplasts. Bioinformatics analysis identified sequence motifs that were conserved among members of the identified subgroups. The sequence motifs identified by the two independent approaches were nearly identical or significantly overlapped. Furthermore, the accuracy of predicting a chloroplast protein was greatly increased by grouping the transit peptides into multiple sequence subgroups. Based on these data, we propose that the transit peptides are composed of multiple sequence subgroups that contain distinctive sequence motifs for chloroplast targeting.
Nuclear-encoded plastid proteins contain N-terminal transit peptide sequences with all the information necessary for their import into plastids (von Heijne et al., 1989
Another important question is the nature of the information encoded by the transit peptide. This question has been addressed by many different approaches (von Heijne et al., 1989
Another approach used to address the information carried by transit peptides is to biochemically study the transit peptide structure. Transit peptides are largely unstructured in aqueous solution (Wienk et al., 1999 Here, we analyzed transit peptide sequences in detail by two different approaches, in vivo protein import experiments using Ala substitution mutants and a bioinformatics-based prediction approach. These combined approaches provide evidence that transit peptides can be grouped into multiple subgroups with distinct sequence motifs.
The Transit Peptidome Is Composed of Highly Diverse Sequences Early attempts to identify transit peptide homology blocks from the transit peptidome failed, due to the high degree of sequence variability among transit peptides (Emanuelsson et al., 1999 To test this hypothesis, we performed a hierarchical clustering on the transit peptidome of Arabidopsis thaliana, which consisted of 208 plastid proteins that have been experimentally confirmed as chloroplast proteins (see Methods). The hierarchical clustering yielded a dendrogram showing that the transit peptide sequences are highly dissimilar (see Supplemental Figure 1 online). We chose seven proteins from this dendrogram whose transit peptide sequences were highly different from each other: RbcS, chlorophyll a/b binding protein (Cab), biotin carboxyl carrier protein (BCCP), protochlorophyllide oxidoreductase A (PORA), DnaJ-J8, tocopherol cyclase (TOCC), and ferredoxin-dependent glutamate synthase 2 (GLU2). When these transit peptides were compared, they displayed highly diverse amino acid sequences (Figure 1A ). Their low sequence identity and insignificant P values of the global alignment scores when aligned with each other (Figure 1B) suggested that these transit peptides are markedly distant from each other at the amino acid sequence level.
Transit Peptides of the Representative Proteins Contain Highly Specific Sequence Motifs To examine the significance of the identified sequence diversity with respect to protein import into chloroplasts, we generated the reporter proteins Cab:GFP, BCCP:GFP, DnaJ-J8:GFP, PORA:GFP, GLU2:GFP, and TOCC:GFP, by fusing the N-terminal regions (67 or 80 amino acids) of Cab, BCCP, DnaJ-J8, PORA, GLU2, and TOCC to green fluorescent protein (GFP), respectively. The N-terminal regions used for the construction of GFP reporter proteins included the processing site predicted by ChloroP plus its downstream 8 to 34–amino acid region (Figure 1A). Fluorescence microscopy revealed that these reporter proteins were targeted to chloroplasts (Figure 1C), as was previously observed for RbcS-nt:GFP (Hand et al., 1989
We next determined the approximate processing site in these transit peptides. We initially employed the prediction algorithm ChloroP to predict the cleavage sites in the transit peptides. The predicted cleavage sites were located between amino acid positions 54 to 55, 35 to 36, 46 to 47, 61 to 62, 53 to 54, 72 to 73, and 47 to 48 in RbcS, Cab, DnaJ-J8, BCCP, PORA, GLU2, and TOCC transit peptides, respectively (arrowheads, Figure 1A, arrowheads). It was previously shown that processing of the RbcS transit peptide occurs at the cleavage site predicted here (Clark and Lamppa, 1991 To confirm that processing occurred at the predicted cleavage sites, we generated transit peptide deletion mutants lacking the N-terminal region upstream of the cleavage site. In contrast with the other transit peptides, the processed form of Cab reporter proteins migrated faster than the predicted one. Thus, based on the apparent size of the Cab processed form, we generated a deletion mutant that lacked 55 amino acids from the N-terminal region (Figure 1A, arrow). These six deletion constructs were then translated in vitro and separated along with their corresponding processed forms on SDS-polyacrylamide gels. The deletion mutants of DnaJ-J8, BCCP, PORA, GLU2, and TOCC migrated to the same positions as their corresponding processed forms (Figure 1E), indicating that the cleavage occurred at or near the predicted amino acid positions. By contrast, the processed form of Cab reporter proteins in protoplasts migrated slightly faster than the in vitro–translated deletion mutant, indicating that processing occurred further downstream of the site used to generate the Cab deletion mutant. Thus, this processing likely occurred at a position in the underlined region (Figure 1A).
We next identified sequence motifs in individual transit peptides, beginning with the Cab transit peptide. The Cab transit peptide was divided into 10–amino acid segments (T1 to T6), and the segments were replaced one at a time with Ala residues to generate single-T mutants (Figure 2A
) as described previously (Lee et al., 2006
Since Cab[T4A]:GFP did not display any detectable GFP signal, protoplasts transformed with Cab[T4A]:GFP were immunostained with an anti-GFP antibody(Figure 2B, panel d). Strong signals were detected as a ring pattern, reminiscent of localization to the envelope membrane as previously seen for some mutants of the RbcS transit peptide (Lee et al., 2006
To further delineate sequence motifs in these regions, the first or second half of each T segment was restored to wild-type sequence in the background of T3A, T4A, or T5A. Restoration of the amino acids KSKF in T3A completely rescued chloroplast import (Figure 2C, panels a and d), and restoration of the sequences PLPN or GNVGR in T4A almost completely rescued the targeting efficiency to wild-type levels (Figure 2C, panels b, e, and h). Restoration of the sequence IRMAA in T5A also restored targeting efficiency to >90% of that of the wild type (Figure 2C, panels c and f). Together, these results strongly suggest that the sequence motifs KSKF (T3), PNPL (T4), GNVGR (T4), and IRMAA (T5) are critical for protein import into chloroplasts and that the two halves of the T4 segment are functionally redundant. These sequence motifs are different from those in the RbcS transit peptide (Lee et al., 2006 A similar approach was employed to identify DnaJ-J8 transit peptide sequence motifs. Protoplasts transformed with single-T mutants were directly observed with fluorescence microscopy. Except for DnaJ-J8[T6A]:GFP, which produced a staining pattern similar to that of the wild type, all the single T-mutants displayed strong GFP signals in the cytoplasm. These results suggest that critical sequence motifs are dispersed throughout the T2 to T5 segments of the DnaJ-J8 transit peptide (Figure 3B ). To confirm this, protein extracts from transformed protoplasts were analyzed by protein gel blotting using an anti-GFP antibody. The mutants produced varying amounts of precursors. Consistent with the image analysis, all of the single-T mutants, except T6A, were severely handicapped in delivering proteins to chloroplasts. T2A, T3A, T4A, and T5A import efficiencies were <5 to 10% that of the wild type (Figure 3B, panel f). The T2A:GFP and T4A:GFP proteolytic products (Figure 3B, asterisks) migrated slightly faster than the mature processed wild-type form.
To test whether the processed forms of T2A:GFP and T4A:GFP were also imported into chloroplasts, intact chloroplasts were purified from gently lysed protoplast extracts on a Percoll gradient and analyzed by protein gel blotting using an anti-GFP antibody. Processed T2A:GFP and T4A:GFP forms copurified with chloroplasts, suggesting that they had been imported (see Supplemental Figure 2A online). In addition, the upper bands of T2A:GFP and T4A:GFP migrated slightly faster than their wild-type precursors. To verify that the upper bands of these mutants represented precursors and not proteolytic products, they were translated in vitro using wheat germ extract, and their motilities were compared with those produced in protoplasts. The migration patterns of the in vitro–translated mutant proteins were identical to their corresponding protoplast forms, indicating that the size differences in the single-T mutant precursors were likely due to Ala substitution (see Supplemental Figure 3A online). These results indicate that T2, T3, T4, and T5 of the DnaJ-J8 transit peptide contain sequence motifs critical for protein import. DnaJ-J8 sequence motifs were further delineated into half-T regions (Figure 3A). In T2A, restoration of either the first or second half of T2 improved targeting efficiency to the wild-type level (Figure 3C, panels a and e). Restoration of SSSSS in T3A rescued targeting efficiency to >90% of that of the wild type (Figure 3C, panels b and f). Restoration of either the first or second half of T4 or T5 was sufficient to rescue T4A and T5A import efficiencies to >90% of that of the wild type (Figure 3C, panels c, d, g, and h). These results indicate that SSSSS (T3), NSRRK (T4), NTKML (T4), NRSKV (T5), and VCSSS (T5) are critical sequence motifs for protein import and that GFSGL and PGSSF of T2 also play an additional role in import. The results also strongly suggest that GFSGL and PGSSF in T2, NSRRK and NTKML in T4, and NRSKV and VCSSS in T5 are functionally redundant to each other.
It has been shown that there is a high preference for hydroxylated amino acids, such as Ser, Thr, and Pro, in the transit peptidome (von Heijne et al., 1989 Sets of single- and double-T mutants were also generated for the BCCP transit peptide. These mutants were transformed into protoplasts, and GFP staining patterns were observed directly with fluorescence microscopy. Of the single-T mutants, only T3A:GFP and T4A:GFP displayed weak GFP signals in the cytoplasm together with strong GFP signals in chloroplasts (Figure 4B , panels c and d), indicating that the import efficiency of each of the single-T mutants was comparable to the wild type. To confirm these findings, protein extracts from transformed protoplasts were analyzed by protein gel blotting using an anti-GFP antibody. Consistent with the image analysis of single-T mutants, only T3A and T4A displayed minor defects in protein import (Figure 4B, panels j and k). T3A and T4A import efficiencies were quantified at three different time points. During these periods, T3A targeting efficiency increased from 65 to 78%, and T4A targeting efficiency increased from 83 to 90% (Figure 4E, panels a and b). These results strongly suggest a high degree of redundancy in the sequence motifs of the BCCP transit peptide.
A small proportion of an intermediate form of T6A was detected, suggesting that T6A may have a minor defect in crossing the envelope membrane (Figure 4B, panel m). Of the double-T mutants (Figure 4C), T35A, T36A, and T46A displayed severe import defects (Figure 4D, panels a to c, h, i, and k). In particular, their targeting efficiencies were reduced to 38, 43, and 27%, respectively, at 8 h after transformation (Figure 4E, panels c to e). Their targeting efficiencies gradually increased to 65 to 75% with time. These results indicate that the T3, T4, T5, and T6 segments of BCCP contained critical sequence motifs and that sequence motifs in between T3 and T5 or T6, and between T4 and T6 were functionally redundant.
T segments in the double-T mutants were divided into 5–amino acid regions, and each region was restored in the background of double-T mutants (Figure 5A
). Of the four T35A restoration mutants, T35A+FPIQN and T35A+AKPKL displayed targeting efficiencies similar to T5A and T3A, respectively (Figure 5B, panels a, d, g, and h), indicating that FPIQN (T3) and AKPKL (T5) are critical for protein import. Of the four T36A restoration mutants, T36A+FPIQN and T36A+PSRSS import efficiencies were similar to those of T6A and T3A, respectively, while the T36A+YPVVK import efficiency was similar to that of T3A (Figure 5B, panels b, e, i, and j). These results indicated that FPIQN (T3), PSRSS (T6), and YPVVK (T6) are critical for protein import. Of the four T46A restoration mutants, T46A+RSRRV and T46A+SFRLS import efficiencies were
Sequence motifs in the PORA transit peptide were defined using a similar approach. Of the single-T mutants, T5A:GFP produced strong GFP signals in the cytoplasm with almost no signal in chloroplasts (Figure 6B , panel d). GFP signals in T2A:GFP, T3A:GFP, and T6A:GFP were primarily detected in the chloroplasts, with weak signals in the cytoplasm (Figure 6B, panels a, b, and e). Targeting efficiency was determined by protein gel blot analysis using an anti-GFP antibody. The targeting efficiency of T5A was <5% of that of the wild-type transit peptide, whereas T2A, T3A, and T6A retained 40 to 60% of the wild-type targeting efficiency (Figure 6B, panel f). The proteolytic products (Figure 6B, asterisks) of T2A, T4A, and T6A migrated slightly faster than the processed form of the wild type.
We examined whether these protein species were also imported into chloroplasts. Chloroplasts were purified from gently lysed protoplast extracts and analyzed by protein gel blotting using an anti-GFP antibody. The proteolytic proteins copurified with chloroplasts, suggesting that they were imported into chloroplasts (see Supplemental Figure 2B online). In addition, as observed with DnaJ-J8, the upper bands of the T2A, T4A, and T6A mutants migrated faster than the wild type (Figure 6B, panel f). To verify that this was also due to Ala substitution, these mutants were translated in vitro using wheat germ extract and their motilities were compared with those expressed in protoplasts. The in vitro–translated proteins produced identical migration patterns to those expressed in protoplasts, confirming that the upper migrating protein species were mutant precursors (see Supplemental Figure 3B online). In the single-T mutants, the first or second half regions of the T segments were restored (Figure 6A). T2A+SAFSV, T3A+SSSFK, and T6A+SLRCK displayed targeting efficiencies similar to the wild type (Figure 6C, panels a, b, d to f, and h). Restoration of either the first or second half of T5A was equally effective in restoring targeting efficiency to about the wild-type level (Figure 6C, panels c and g). Together, these results suggest that SAFSV (T2), SSSFK (T3), EQSKA (T5), DFVSS (T5), and SLRCK (T6) of the PORA transit peptide are critical sequence motifs for protein import. Furthermore, EQSKA and DFVSS of T5 may be functionally redundant.
To define sequence motifs in the GLU2 transit peptide, we generated single T Ala substitution mutants (Figure 7A
). Of the single-T mutants, T3A:GFP and T5A:GFP produced strong GFP signals in the cytoplasm in addition to the chloroplasts (Figure 7B, panels b and d). GFP signals in T2A:GFP, T4A:GFP, and T6A:GFP were primarily detected in the chloroplasts (Figure 7B, panels a, c, and e). Targeting efficiency was determined by protein gel blot analysis using an anti-GFP antibody. The targeting efficiency of T3A and T5A was
In T3A, T4A, and T5A mutants, the first or second half regions of the T segments were restored (Figure 7A). In these mutants, restoration of either the first or second half region was equally effective in restoring the targeting efficiency to approximately wild-type levels (Figure 7C, panels a to g). Together, these results suggest that SAKLS (T3), STKTI (T3), ISKGT (T5), and KRRNE (T5) of GLU2 are critical sequence motifs for protein import. In addition, FSVDF (T4) and VRSYC (T4) also play a role in protein import. Finally, the lack of a sequence motif in T2 and T6 was confirmed by the T26A mutant (Figure 7C, panels d and h).
Finally, to define sequence motifs in the TOCC transit peptide initially, single T Ala substitution mutants were used in protein import experiments (Figure 8A
). Of the single-T mutants, T3A:GFP, T4A:GFP, and T5A:GFP produced strong GFP signals in the cytoplasm in addition to the chloroplasts (Figure 8B, panels b to d), whereas GFP signals of T2A:GFP and T6A:GFP were primarily detected in the chloroplasts (Figure 8B, panels a and e). Targeting efficiency was determined by protein gel blot analysis using an anti-GFP antibody. The targeting efficiency of T3A, T4A, and T5A was
In T3A, T4A, and T5A mutants, the first or second half regions of the T segments were restored (Figure 8A). In the T3A mutant, restoration of either the first or second half region restored the targeting efficiency >90% that of the wild-type level (Figure 8C, panel d). In the cases of T4A and T5A mutants, restoration of the first half region restored the targeting efficiency to >95% that of the wild type (Figure 8C, panels e and f). Together, these results suggest that RPVSP (T3), LTRSL (T3), VPFRS (T4), and RSISR (T5) of TOCC are critical sequence motifs for protein import.
The Hydrophobic Domain of Some T1 Regions Plays a Role in the Efficient Targeting of Proteins to Chloroplasts
Sequence Motif Prediction of Plastid Transit Peptides We also asked whether the sequence motifs of these seven highly dissimilar transit peptides could be predicted by sequence information alone. We developed a motif discovery algorithm to identify a set of motif segments discriminating between plastid and nonplastid proteins (see Supplemental Methods online). Because we hypothesize that plastid transit sequences comprise multiple subgroups with distinct sequence motifs, this algorithm attempts to find sequence motifs that occur more frequently in a subgroup of plastid transit sequences compared with the N termini of nonplastid proteins. Using this motif discovery algorithm, we tested whether we could find sequence motifs from transit peptides. We used 208 Arabidopsis plastid proteins that have been experimentally shown to localize to the plastid. The list of these 208 plastid proteins with their predicted sequence motifs is available in Supplemental Data Set 1 online. From these data, we used the previously identified seven highly dissimilar transit peptides and compared their sequence motifs obtained by the motif discovery algorithm with those obtained by our import experiments above. The sequence motifs identified by the two independent approaches were nearly identical or significantly overlapping (Figure 10A ). We calculated P values to evaluate the overlap between the experimentally determined motifs and the predicted ones and found that all of the predictions were statistically significant (P < 0.1) except RbcS and GLU2, suggesting that our algorithm adequately captured the sequence features of plastid transit peptides (Figure 10B). In addition, these results raised the possibility that the sequence motifs identified by transit peptide functional analysis are conserved among members of the subgroups characterized by these sequence motifs.
In the analysis, we initially used the first 80 amino acid residues of each transit protein in our motif discovery algorithm. It was therefore likely that some sequences include residues from mature proteins. To exclude this possibility, we performed motif prediction again, using only amino acid residues from the N terminus to the ChloroP-predicted cleavage site. This caused the performance of our prediction algorithm to degrade significantly, with poor alignment (insignificant P values) between the experimentally determined and predicted motifs (see Supplemental Figure 4 online).
One possible explanation for our failure to predict sequence motifs in this manner is that the analyzed sequences may not have been sufficiently long to reliably produce any conserved sequence motifs. In the cases of ChloroP and TargetP, 100–amino acid residues of the N-terminal regions were used for reliable transit peptide predictions (Emanuelsson et al., 1999
To further confirm this possibility, we generated new reporter constructs by fusing GFP to the N-terminal upstream regions of the predicted cleavage sites from Cab, BCCP, PORA, and DnaJ-J8 (see Supplemental Figure 5A online). These GFP reporter constructs, Cab-cs:GFP, BCCP-cs:GFP, DnaJ-J8-cs:GFP, and PORA-cs:GFP, were introduced into protoplasts and examined for their targeting to chloroplasts by imaging and protein gel blot analyses. BCCP-cs:GFP produced primarily processed forms with a minor portion of precursors. Its ratio of processed forms to precursors was nearly identical to that of BCCP-nt:GFP (see Supplemental Figure 5C online), indicating that the N-terminal portion upstream of the cleavage site is sufficient for targeting. By contrast, for other constructs, the precursor levels were greatly increased compared with their respective full-length transit peptides with levels equal to or greater than those of processed forms. In particular, PORA-cs:GFP primarily produced precursors and only a small amount of processed forms. These results demonstrate that in the case of the four transit peptides analyzed here, the upstream regions of the cleavage site are not sufficient for efficient protein import into chloroplasts. This is consistent with the observation that the downstream region of the cleavage site also contains important sequence motifs (Comai et al., 1988 To confirm that the processed forms were imported into chloroplasts, we purified chloroplasts from gently lysed protoplast extracts by Percoll gradient and analyzed the chloroplast fractions by protein gel blotting analysis using an anti-GFP antibody. The processed forms but not the precursors were detected in the chloroplast fractions (see Supplemental Figure 5D online), strongly suggesting that in most transit peptides the region downstream of the cleavage site is necessary for efficient protein import into chloroplasts.
We next examined the 208 transit peptides to determine whether the experimentally identified sequence motifs were also present in other transit peptides. A transit peptide was assumed to have a particular sequence motif and was assigned to the corresponding sequence group if the P value of our proposed alignment score with dual gap penalties between the transit peptide and sequence motifs was lower than a specified cutoff value (
Sequence Motif–Based Approach Increases the Accuracy of Chloroplast Transit Peptide Prediction
The prediction power of our method (MultiP) was compared with four other methods, including ChloroP (Emanuelsson et al., 1999
Here, we demonstrated that the transit peptides of nuclear-encoded plastid proteins can be clustered into multiple subgroups that contain subgroup-specific distinctive sequence motifs. These subgroup-specific sequence motifs were identified by two independent approaches. The first approach was based on the results of protoplast protein import experiments, in which we used Ala substitution mutants (Lee et al., 2006
These results demonstrate not only that transit peptides contain highly conserved sequence motifs but also that the conserved sequence motifs play critical roles in protein import into chloroplasts. This is in contrast with the current notion that transit peptides do not contain any conserved sequence motifs (Emanuelsson et al., 1999
Detailed analysis of these representative transit peptides revealed that they display distinct characteristics. RbcS transit peptides contain a large number of sequence motifs throughout the entire transit peptide region. In addition, these sequence motifs display a high degree of functional redundancy, despite a lack of amino acid sequence similarity (Lee et al., 2006
In addition, analysis of the amino acid composition of the critical sequence motifs yielded a different picture from that obtained from total transit peptide sequences. It is generally accepted that hydroxylated amino acid residues, such as Ser and Thr, are well represented in transit peptides (Bruce, 2000
The exact role of these sequence motifs was not directly addressed in this study. Depending on the individual sequence motif, Ala substitution produced characteristic features in the extent of precursor processing and/or production of intermediate forms, similar to the RbcS transit peptide (Lee et al., 2006
The presence of multiple subgroups in the transit peptidome raises the question of how transit peptides belonging to different subgroups are recognized by import receptors. One possibility is that there exist multiple types of import receptors. This hypothesis is supported by results showing that the import receptors Toc159 and Toc33 are responsible for the import of photosynthetic proteins, whereas other receptors, such as Toc132 and Toc34, are needed for the import of nonphotosynthetic ones (Bédard and Jarvis, 2005
Growth of Plants Arabidopsis thaliana (Columbia ecotype) was grown in soil at 20 to 25°C in a greenhouse with a 16 h/8 h light/dark cycle or on Murashige and Skoog plates in a growth chamber at 20°C. Leaf tissues were harvested from 2-week-old plants and used immediately for protoplast isolation.
PCR-Based Mutagenesis and Plasmid Construction
Transient Expression in Protoplasts Images were obtained using a cooled CCD camera and a Zeiss Axioplan fluorescence microscope. The filter sets used were XF116 (exciter, 474AF20; dichroic, 500DRLP; emitter, 510AF23) and XF33/E (exciter, 535DF35; dichroic, 570DRLP; emitter, 605DF50) (Omega) for GFP/FITC and RFP/TRITC, respectively. The data were processed using Adobe Photoshop software, and the images were rendered in pseudocolor.
Protein Gel Blot Analysis and Signal Intensity Quantification
In Vitro Translation in Wheat Germ Extracts
Chloroplast Purification
Global Alignment with Dual Gap Penalties
Prediction of Sequence Motifs in Plastid Transit Peptides
In the forward motif selection step, the model selected each motif segment with a length of five amino acids by scanning the query sequence. It then converted each motif sequence into the three-dimensional feature vector. The first two components of the feature vector were the scores of the global and local alignments between the query sequence and the sequence of the feature vector (Needleman and Wunsch, 1970 Next, the backward motif elimination step deleted unlikely motif segments chosen from forward motif selection. Specifically, motif segments that produced the smallest error bound value were eliminated at each iteration. Iteration was stopped when the smallest value of the error bound was larger than the previous one.
Prediction of Plastid Transit Peptides In the clustering step, we started with a randomly selected plastid transit peptide and predicted its sequence motifs through a model for discriminative motif discovery. We used motif alignment scores to group sequences sharing similar motifs. We assumed that sequences belonged to the same group when the P value of its motif alignment score was below a predefined cutoff (P < 0.001). We then selected another plastid transit peptide with a maximum P value from the positive set where the grouped sequences had been removed. We repeated these processes until the positive set was empty. The output of the clustering module was a set of representative sequences with predicted sequence motifs.
The feature extraction step converted each sequence into a corresponding feature vector with the empirical kernel map technique using alignment scores (Kim et al., 2006b
In the last step, an SVM classifier learned to predict the binary class label of an input sequence (Cristianini and Shawe-Taylor, 2000
Preparation of 986 Arabidopsis Protein Sequences
Accession Numbers
Supplemental Data
This work was supported by grants from the Systems Bio-Dynamics Center (R15-2004-033-05002-0), the Ministry of Education, Science, and Technology (I.H.), the Agricultural Research and Planning Center of the Ministry of Agriculture, Fishery, and Food (I.H.), BioGreen 21 Program (20070401-034-026-008-03-00), Rural Development Administration (I.H.), the POSTECH Core Research Program (I.H.), the Korea Science and Engineering Foundation (R15-2004-033-07002-0) (S.K.), and the Korea Research Foundation (KRF-2006-311-C00503) (S.K.) and by a Microsoft Research Asia fellowship (J.K.K.).
1 These authors contributed equally to this work. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Inhwan Hwang (ihhwang{at}postech.ac.kr).
[W] Online version contains Web-only data. www.plantcell.org/cgi/doi/10.1105/tpc.108.060541 Received May 5, 2008; Revision received May 5, 2008. accepted May 23, 2008.
Arimura, S., Takusagawa, S., Hatano, S., Nakazono, M., Hirai, A., and Tsutsumi, N. (1999). A novel plant nuclear gene encoding chloroplast ribosomal protein S9 has a transit peptide related to that of rice chloroplast ribosomal protein L12. FEBS Lett. 450: 231–234.[CrossRef][Web of Science][Medline] Becker, T., Jelic, M., Vojta, A., Radunz, A., Soll, J., and Schleiff, E. (2004). Preprotein recognition by the Toc complex. EMBO J. 23: 520–530.[CrossRef][Web of Science][Medline] Bédard, J., and Jarvis, P. (2005). Recognition and envelope translocation of chloroplast preproteins. J. Exp. Bot. 56: 2287–2320. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2007). GenBank. Nucleic Acids Res. 35: D21–D25. Bhushan, S., Kuhn, C., Berglund, A.K., Roth, C., and Glaser, E. (2006). The role of the N-terminal domain of chloroplast targeting peptides in organellar protein import and miss-sorting. FEBS Lett. 580: 3966–3972.[CrossRef][Web of Science][Medline] Bruce, B.D. (2000). Chloroplast transit peptides: Structure, function and evolution. Trends Cell Biol. 10: 440–447.[CrossRef][Web of Science][Medline] Clark, S.E., and Lamppa, G.K. (1991). Determinants for cleavage of the chlorophyll a/b binding protein precursor: A requirement for a basic residue that is not universal for chloroplast imported proteins. J. Cell Biol. 114: 681–688. Comai, L., Larson-Kelly, N., Kiser, J., Mau, C.J., Pokalsky, A.R., Shewmaker, C.K., McBride, K., Jones, A., and Stalker, D.M. (1988). Chloroplast transport of a ribulose bisphosphate carboxylase small subunit-5-enolpyruvyl 3-phosphoshikimate synthase chimeric protein requires part of the mature small subunit in addition to the transit peptide. J Biol Chem. 263: 15104–15109. Cristianini, N., and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. (Cambridge, UK: Cambridge University Press). Duda, R.O., Hart, P.E., and Stork, D.G. (2001). Pattern Classification. (New York: John Wiley & Sons). Emanuelsson, O., Nielsen, H., Brunka, S., and von Heijne, G. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300: 1005–1016.[CrossRef][Web of Science][Medline] Emanuelsson, O., Nielsen, H., and von Heijne, G. (1999). ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8: 978–984.[Web of Science][Medline] Emanuelsson, O., and von Heijne, G. (2001). Prediction of organellar targeting signals. Biochim. Biophys. Acta 1541: 114–119.[Medline] Gantt, J.S., Baldauf, S.L., Calie, P.J., Weeden, N.F., and Palmer, J.D. (1991). Transfer of rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain of an intron. EMBO J. 10: 3073–3078.[Web of Science][Medline] Gutensohn, M., Schulz, B., Nicolay, P., and Flugge, U.I. (2000). Functional analysis of the two Arabidopsis homologues of Toc34, a component of the chloroplast protein import apparatus. Plant J. 23: 771–783.[CrossRef][Web of Science][Medline] Hand, J.M., Szabo, L.J., Vasconcelos, A.C., and Cashmore, A.R. (1989). The transit peptide of a chloroplast thylakoid membrane protein is functionally equivalent to a stromal-targeting sequence. EMBO J. 8: 3195–3206.[Web of Science][Medline] Heazlewood, Z.L., Tonti-Filippini, J., Verboom, R.E., and Millar, A.H. (2005). Combining experimental and predicted datasets for determinination of the subcellular location of proteins in Arabidopsis. Plant Physiol. 139: 598–609. Hinnah, S.C., Wagner, R., Sveshnikova, N., Harrer, R., and Soll, J. (2002). The chloroplast protein import channel Toc75: Pore properties and interaction with transit peptides. Biophys. J. 83: 899–911.[Web of Science][Medline] Jarvis, P., and Soll, J. (2002). Toc, tic, and chloroplast protein import. Biochim. Biophys. Acta 1590: 177–189.[Medline] Jin, J.B., Kim, Y.A., Kim, S.J., Lee, S.H., Kim, D.H., Cheong, G.W., and Hwang, I. (2001). A new dynamin-like protein, ADL6, is involved in trafficking from the trans-Golgi network to the central vacuole in Arabidopsis. Plant Cell 13: 1511–1526. Kessler, F., and Schnell, D.J. (2004). Chloroplast protein import: Solve the GTPase riddle for entry. Trends Cell Biol. 14: 334–338.[CrossRef][Web of Science][Medline] Kim, D.H., Eu, Y.J., Yoo, C.M., Kim, Y.W., Pih, K.T., Jin, J.B., Kim, S.J., Stenmark, H., and Hwang, I. (2001). Trafficking of phosphatidylinositol 3-phosphate from the trans-Golgi network to the lumen of the central vacuole in plant cells. Plant Cell 13: 287–301. Kim, J.K., Bang, S.Y., and Choi, S. (2006b). Sequence driven features for prediction of subcellular localization of proteins. Pattern Recognit. 39: 2301–2311.[CrossRef][Web of Science] Kim, J.K., Raghava, G.P.S., Bang, S.Y., and Choi, S. (2006a). Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine. Pattern Recognit. Lett. 27: 996–1001.[CrossRef] Kohler, R.H., Cao, J., Zipfel, W.R., Webb, W.W., and Hanson, M.R. (1997). Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039–2042. Lee, D.W., Lee, S., Lee, G.J., Lee, K.H., Kim, S., Cheong, G.W., and Hwang, I. (2006). Functional characterization of sequence motifs in the transit peptide of Arabidopsis small subunit of rubisco. Plant Physiol. 140: 466–483. McFadden, G.I. (1999). Endosymbiosis and evolution of the plant cell. Curr. Opin. Plant Biol. 2: 513–519.[CrossRef][Web of Science][Medline] Millar, A.H., Whelan, J., and Small, I. (2006). Recent surprises in protein targeting to mitochondria and plastids. Curr. Opin. Plant Biol. 9: 610–615.[CrossRef][Web of Science][Medline] Miller, E.A., Beilharz, T.H., Malkus, P.N., Lee, M.C., Hamamoto, S., Orci, L., and Schekman, R. (2003). Multiple cargo binding sites on the COPII subunit Sec24p ensure capture of diverse membrane proteins into transport vesicles. Cell 114: 497–509.[CrossRef][Web of Science][Medline] Needleman, S.B., and Wunsch, C.D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443–453.[CrossRef][Web of Science][Medline] Reinbothe, S., Runge, S., Reinbothe, C., van Cleve, B., and Apel, K. (1995). Substrate-dependent transport of the NADPH:protochlorophyllide oxidoreductase into isolated plastids. Plant Cell 7: 161–172.[Abstract] Rensink, W.A., Pilon, M., and Weisbeek, P. (1998). Domains of a transit sequence required for in vivo import in Arabidopsis chloroplasts. Plant Physiol. 118: 691–699. Reumann, S., Davila-Aponte, J., and Keegstra, K. (1999). The evolutionary origin of the protein-translocating channel of chloroplastic envelope membranes: Identification of a cyanobacterial homolog. Proc. Natl. Acad. Sci. USA 96: 784–789. Rial, D.V., Arakaki, A.K., and Ceccarelli, E.A. (2000). Interaction of the targeting sequence of chloroplast precursors with Hsp70 molecular chaperones. Eur. J. Biochem. 267: 6239–6248.[Web of Science][Medline] Schein, A.I., Kissinger, J.C., and Ungar, L.H. (2001). Chloroplast transit peptide prediction: a peek inside the black box. Nucleic Acids Res. 29: E82.[CrossRef][Medline] Schleiff, E., and Soll, J. (2005). Membrane protein insertion: Mixing eukaryotic and prokaryotic concepts. EMBO Rep. 6: 1023–1027.[CrossRef][Web of Science][Medline] Small, I., Peeters, N., Legeai, F., and Lurin, C. (2004). Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics 4: 1581–1590.[CrossRef][Web of Science][Medline] Smith, M.D., Rounds, C.M., Wang, F., Chen, K., Afitlhile, M., and Schnell, D.J. (2004). atToc159 is a selective transit peptide receptor for the import of nucleus-encoded chloroplast proteins. J. Cell Biol. 165: 323–334. Smith, T.F., and Waterman, M.S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147: 195–197.[CrossRef][Web of Science][Medline] UniProt Consortium (2007). The universal protein resource (UniProt). Nucleic Acids Res. 35: D193–D197. von Heijne, G., Steppuhn, J., and Herrmann, R.G. (1989). Domain structure of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 180: 535–545.[Web of Science][Medline] Wienk, H.L., Czisch, M., and de Kruijff, B. (1999). The structural flexibility of the preferredoxin transit peptide. FEBS Lett. 453: 318–326.[CrossRef][Web of Science][Medline] Wienk, H.L., Wechselberger, R.W., Czisch, M., and de Kruijff, B. (2000). Structure, dynamics, and insertion of a chloroplast targeting peptide in mixed micelles. Biochemistry 39: 8219–8282.[CrossRef][Web of Science][Medline] Zhang, X.P., and Glaser, E. (2002). Interaction of plant mitochondrial and chloroplast signal peptides with the Hsp70 molecular chaperone. Trends Plant Sci. 7: 14–21.[CrossRef][Web of Science][Medline] This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | THE PLANT CELL | PLANT PHYSIOLOGY | |
|---|---|---|---|