The Plant Cell, Vol. 15, 2497-2502, November 2003,
www.plantcell.org ©2003, American Society of Plant Biologists
Update on the Basic Helix-Loop-Helix Transcription Factor Gene Family in Arabidopsis thaliana
Paul C. Baileya,b,c,d,e,
Cathie Martina,b,c,d,e,
Gabriela Toledo-Ortiza,b,c,d,e,
Peter H. Quaila,b,c,d,e,
Enamul Huqa,b,c,d,e,
Marc A. Heima,b,c,d,e,
Marc Jakobya,b,c,d,e,
Martin Werbera,b,c,d,e and
Bernd Weisshaara,b,c,d,e
a John Innes Centre Colney Lane NR4 7UH Norwich, UK
b Department of Plant and Microbial Biology University of California Berkeley, CA 94720 and United States Department of Agriculture Agricultural Research Service Plant Gene Expression Center Albany, CA 94710
c Section of Molecular Cell and Developmental Biology University of Texas 1 University Station, A6700 Austin, TX 78712
d Max-Planck-Institute for Plant Breeding Research, 50829 Köln, Germany
e Institute for Genome Research, Bielefeld University, 33594 Bielefeld, Germany
Basic helix-loop-helix (bHLH) transcription factors represent a family of proteins that contain a bHLH domain, a motif involved in binding DNA. Recently, two groups independently analyzed the BHLH gene family of Arabidopsis thaliana (Heim et al., 2003 ; Toledo-Ortiz et al., 2003 ). These analyses revealed that this family is one of the largest transcription factor gene families in Arabidopsis thaliana. Although both analyses intended to give complete overviews of AtBHLH genes, some discrepancies were detected when the data sets were compared. After careful re-examination, we have resolved these discrepancies. In Table 1, we provide a uniform nomenclature for all of the genes that are mentioned in our two articles, and we encourage the use of this nomenclature in future reports concerning bHLH domain transcription factors (e.g., AtBHLH042/TT8).
Cross-referencing between the two data sets and further analysis have extended the total number of detected AtBHLH genes to 162 (Table 1). We assume that this count is very close to the final number of AtBHLH genes present in the Arabidopsis thaliana genome, but clearly, corrections or additions to the complete Arabidopsis thaliana genome sequence in the future still may cause this number to change. During examination and comparison of the data sets, we observed some common problems that contributed to the discrepancies. These problems arise commonly during the handling of large data sets and are discussed here to aid future attempts at gene family annotation. The main reasons for discrepancies were as follows.
(1) Differences between TIGR (www.tigr. org) or TAIR (www.arabidopsis.org) and MIPS (MAtDB; mips.gsf.de/projects/plants). Such differences are not easy to avoid, despite the best efforts of the database providers. Most problematic are differences in Arabidopsis Genome Initiative (AGI) codes for the same gene between the different databases.
(2) Positions on pseudochromosomes that are not stable as a result of corrections in single BAC sequences that affect the entire area downstream of the corrected locus.
(3) BAC identifiers and BAC sequence coordinates that differ for the same gene when either the upper or the lower strand is considered. One option is to keep the gene orientation according to the direction of transcription; the other is to keep the original BAC sequence in its 5' to 3' arrangement. Clearly consistency is very important.
(4) Genes located at BAC borders that can result in either double entries of the same gene or failure to detect the gene as a result of the destruction of a continuous signature pattern.
(5) Sequence errors in the genome sequence that destroy open reading frames.
(6) Differences in the detailed definition of what constitutes a bHLH domain.
Both studies started with a subset of known bHLH domain transcription factors and used a consensus sequence described by Atchley et al. (1999) as a reference. However, whereas one analysis was based on bHLH proteins similar to Zea mays Sn (e.g., ZmR) that are involved in secondary metabolism and cell identity pathways (Heim et al., 2003 ), the other used a subset based on PHYTOCHROME-INTERACTING FACTOR3 (PIF3) as a starting point (Toledo-Ortiz et al., 2003 ). In addition, the set of databases used was not completely overlapping. Consequently, some genes were identified as encoding true bHLHs by one group but not by the other, and vice versa. These differences have been removed; there are now only two BHLH genes listed in Table 1 (AtBHLH136/At5g39860 and AtBHLH160/At1g71200) that fit the criteria of Heim et al. (2003) but not those of Toledo-Ortiz et al. (2003) . A third article analyzing plant bHLH domain proteins appeared recently (Buck and Atchley, 2003 ) reporting 118 AtBHLH genes. Of these, 116 correspond to those listed in Table 1. The remaining two (At1g49830 and At5g33210) do not fit the criteria used for Table 1.
Search engines have been greatly improved in the last few years, but they still often are not exact enough to identify certain motifs. This is not necessarily the result of deficiencies in the search algorithms but may result from the structure of matrices that describe known motifs (e.g., AtBHLH125 spanned two separate BAC ends, and two separate predictions had to be fused). Even the continuous optimization of our bHLH domain matrix never resulted in the identification of all 162 AtBHLH genes in one search. Additionally, gene prediction tools are sometimes not flexible enough to respond to variable intron lengths and exon distribution (e.g., the prediction NM_105789 for AtBHLH160 contains an intron that causes an overestimate of the length of the loop structure). It sounds obvious, but it is worth emphasizing that cDNA sequences, even from reverse transcriptasemediated PCR experiments, should be deposited in GenBank (http://www.ncbi.nlm.nih.gov/) or EMBL (http://www.ebi.ac.uk/Databases/) even if the genomic sequence is already in the database, and the metadata of the database entry should be written with care. The most unambiguous identifier of any given gene (unless a sequence-identical duplication exists) is its DNA sequence, and only this information allows designations and identifier assignments to be checked and rechecked.
It is an interesting and critical point that even with a combination of all available BLAST (Basic Local Alignment Search Tool) tools, both groups were unable to obtain a full set of Arabidopsis bHLH domain transcription factors in their initial analyses. Both studies relied on BLAST search capabilities (TBLASTN and BLASTP) and subsequent evaluation of the hits for the respective bHLH consensus sequences. In addition, position-specific iterated BLAST was used by one of the two groups to identify remaining unidentified bHLH domainencoding sequences. Nevertheless, several true BHLH genes were not detected. Some of these initial false negatives were found by searching for the term helix-loop-helix in the annotation databases (e.g., AtBHLH134 and AtBHLH136). However, this search also resulted in many false positives that had to be excluded as a result of misannotations based on weak homology or of inherited misannotation, in which a single wrong annotation text had been used as a reference during annotation. In essence, we were unable to detect slightly divergent or mispredicted BHLH genes. The only solution to this problem may involve systematic annotation by expert annotators, comprehensive EST data production from normalized libraries, and the generation of full-length cDNA at least for protein-coding gene sequences. A significant part of the improvement of the data set presented in Table 1 is based on the reannotation of the Arabidopsis genome by the TIGR group, which followed this approach.
We were able to improve gene annotation further by comparing closely related BHLH genes for their exon/intron structures. This powerful similarity-based approach (used here within a single species) led to the correction of some gene annotations and, consequently, to a further increase in the total number of AtBHLH genes detected. Several of the genes that escaped the initial screens by both groups contain short introns in the region that encodes the loop of the HLH region. These comparably short introns, and also short exons that are part of the bHLH open reading frame, resulted in mispredictions that were a significant cause of false negatives in our initial analyses. One example is AtBHLH160, for which we found a formerly unpredicted intron after comparison with the most closely related genes AtBHLH038/ORG2, AtBHLH039/ORG3, AtBHLH100, and AtBHLH101.
The combined effort of our two groups and the lessons we have learned from the comparison of the two data sets have resulted in an (almost) complete view of the AtBHLH transcription factor gene family, now provided with unambiguous generic names and reference to synonyms. We hope that this work will serve as a solid foundation for further investigations into the functions of the different members of this interesting gene family in plants.
REFERENCES
Abe, H., Urao, T., Ito, T., Seki, M., Shinozaki, K., and Yamaguchi-Shinozaki, K. (2003). Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling. Plant Cell 15, 6378.[Abstract/Free Full Text]
Atchley, W.R., Terhalle, W., and Dress, A. (1999). Positional dependence, cliques, and predictive motifs in the bHLH protein domain. J. Mol. Evol. 48, 501516.[CrossRef][Web of Science][Medline]
Buck, M.J., and Atchley, W.R. (2003). Phylogenetic analysis of plant basic helix-loop-helix proteins. J. Mol. Evol. 56, 742750.[CrossRef][Web of Science][Medline]
Chinnusamy, V., Ohta, M., Kanrar, S., Lee, B.H., Hong, X., Agarwal, M., and Zhu, J.K. (2003). ICE1: A regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev. 17, 10431054.[Abstract/Free Full Text]
Fairchild, C.D., Schumaker, M.A., and Quail, P.H. (2000). HFR1 encodes an atypical bHLH protein that acts in phytochrome A signal transduction. Genes Dev. 14, 23772391.[Abstract/Free Full Text]
Friedrichsen, D.M., Nemhauser, J., Muramitsu, T., Maloof, J.N., Alonso, J., Ecker, J.R., Furuya, M., and Chory, J. (2002). Three redundant brassinosteroid early response genes encode putative bHLH transcription factors required for normal growth. Genetics 162, 14451456.[Abstract/Free Full Text]
Heim, M.A., Jakoby, M., Werber, M., Martin, C., Weisshaar, B., and Bailey, P.C. (2003). The basic helix-loop-helix transcription factor family in plants: A genome-wide study of protein structure and functional diversity. Mol. Biol. Evol. 20, 735747.[Abstract/Free Full Text]
Heisler, M.G., Atkinson, A., Bylstra, Y.H., Walsh, R., and Smyth, D.R. (2001). SPATULA, a gene that controls development of carpel margin tissues in Arabidopsis, encodes a bHLH protein. Development 128, 10891098.[Abstract]
Huq, E., and Quail, P.H. (2002). PIF4, a phytochrome-interacting bHLH factor, functions as a negative regulator of phytochrome B signaling in Arabidopsis. EMBO J. 21, 24412450.[CrossRef][Web of Science][Medline]
Kang, H.G., Foley, R.C., Onate-Sanchez, L., Lin, C., and Singh, K.B. (2003). Target genes for OBP3, a Dof transcription factor, include novel basic helix-loop-helix domain proteins inducible by salicylic acid. Plant J. 35, 362372.[CrossRef][Web of Science][Medline]
Nesi, N., Debeaujon, I., Jond, C., Pelletier, G., Caboche, M., and Lepiniec, L. (2000). The TT8 gene encodes a basic helix-loop-helix domain protein required for expression of DFR and BAN genes in Arabidopsis siliques. Plant Cell 12, 18631878.[Abstract/Free Full Text]
Ni, M., Tepperman, J.M., and Quail, P.H. (1998). PIF3, a phytochrome-interacting factor necessary for normal photoinduced signal transduction, is a novel basic helix-loop-helix protein. Cell 95, 657667.[CrossRef][Web of Science][Medline]
Payne, C.T., Zhang, F., and Lloyd, A.M. (2000). GL3 encodes a bHLH protein that regulates trichome development in Arabidopsis through interaction with GL1 and TTG1. Genetics 156, 13491362.[Abstract/Free Full Text]
Rajani, S., and Sundaresan, V. (2001). The Arabidopsis myc/bHLH gene ALCATRAZ enables cell separation in fruit dehiscence. Curr. Biol. 11, 19141922.[CrossRef][Web of Science][Medline]
Smolen, G.A., Pawlowski, L., Wilensky, S.E., and Bender, J. (2002). Dominant alleles of the basic helix-loop-helix transcription factor ATR2 activate stress-responsive genes in Arabidopsis. Genetics 161, 12351246.[Abstract/Free Full Text]
Sorensen, A.M., Krober, S., Unte, U.S., Huijser, P., Dekker, K., and Saedler, H. (2003). The Arabidopsis ABORTED MICROSPORES (AMS) gene encodes a MYC class transcription factor. Plant J. 33, 413423.[CrossRef][Web of Science][Medline]
Toledo-Ortiz, G., Huq, E., and Quail, P.H. (2003). The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell 15, 17491770.[Abstract/Free Full Text]
Urao, T., Yamaguchi-Shinozaki, K., Mitsukawa, N., Shibata, D., and Shinozaki, K. (1996). Molecular cloning and characterization of a gene that encodes a MYC-related protein in Arabidopsis. Plant Mol. Biol. 32, 571576.[CrossRef][Web of Science][Medline]
Yamashino, T., Matsushika, A., Fujimori, T., Sato, S., Kato, T., Tabata, S., and Mizuno, T. (2003). A link between circadian-controlled bHLH factors and the APRR1/TOC1 quintet in Arabidopsis thaliana. Plant Cell Physiol. 44, 619629.[Abstract/Free Full Text]
Zhang, F., Gonzalez, A., Zhao, M., Payne, C.T., and Lloyd, A.M. (2003). A network of redundant bHLH proteins functions in all TTG1-dependent pathways of Arabidopsis. Development 130, 48594869.[Abstract/Free Full Text]
This article has been cited by other articles:

|
 |

|
 |
 
C. J. Doherty, H. A. Van Buskirk, S. J. Myers, and M. F. Thomashow
Roles for Arabidopsis CAMTA Transcription Factors in Cold-Regulated Gene Expression and Freezing Tolerance
PLANT CELL,
March 1, 2009;
21(3):
972 - 984.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Gray, M. Bevan, T. Brutnell, C. R. Buell, K. Cone, S. Hake, D. Jackson, E. Kellogg, C. Lawrence, S. McCouch, et al.
A Recommendation for Naming Transcription Factor Proteins in the Grasses
Plant Physiology,
January 1, 2009;
149(1):
4 - 6.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X.-X. Shangguan, B. Xu, Z.-X. Yu, L.-J. Wang, and X.-Y. Chen
Promoter of a cotton fibre MYB gene functional in trichomes of Arabidopsis and glandular trichomes of tobacco
J. Exp. Bot.,
October 1, 2008;
59(13):
3533 - 3542.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. J. Rushton, M. T. Bokowiec, S. Han, H. Zhang, J. F. Brannock, X. Chen, T. W. Laudeman, and M. P. Timko
Tobacco Transcription Factors: Novel Insights into Transcriptional Regulation in the Solanaceae
Plant Physiology,
May 1, 2008;
147(1):
280 - 295.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Leivar, E. Monte, B. Al-Sady, C. Carle, A. Storer, J. M. Alonso, J. R. Ecker, and P. H. Quail
The Arabidopsis Phytochrome-Interacting Factor PIF7, Together with PIF3 and PIF4, Regulates Responses to Prolonged Red Light by Modulating phyB Levels
PLANT CELL,
February 1, 2008;
20(2):
337 - 352.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Khanna, Y. Shen, C. M. Marion, A. Tsuchisaka, A. Theologis, E. Schafer, and P. H. Quail
The Basic Helix-Loop-Helix Transcription Factor PIF5 Acts on Ethylene Biosynthesis and Phytochrome Signaling by Distinct Mechanisms
PLANT CELL,
December 1, 2007;
19(12):
3915 - 3929.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Shen, R. Khanna, C. M. Carle, and P. H. Quail
Phytochrome Induces Rapid PIF5 Phosphorylation and Degradation in Response to Red-Light Activation
Plant Physiology,
November 1, 2007;
145(3):
1043 - 1051.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Zentella, Z.-L. Zhang, M. Park, S. G. Thomas, A. Endo, K. Murase, C. M. Fleet, Y. Jikumaru, E. Nambara, Y. Kamiya, et al.
Global Analysis of DELLA Direct Targets in Early Gibberellin Signaling in Arabidopsis
PLANT CELL,
October 1, 2007;
19(10):
3037 - 3057.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Monte, B. Al-Sady, P. Leivar, and P. H. Quail
Out of the dark: how the PIFs are unmasking a dual temporal mechanism of phytochrome signalling
J. Exp. Bot.,
September 12, 2007;
(2007)
erm186v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. O. Borevitz, S. P. Hazen, T. P. Michael, G. P. Morris, I. R. Baxter, T. T. Hu, H. Chen, J. D. Werner, M. Nordborg, D. E. Salt, et al.
Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana
PNAS,
July 17, 2007;
104(29):
12057 - 12062.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y.-s. Su and J. C. Lagarias
Light-Independent Phytochrome Signaling Mediated by Dominant GAF Domain Tyrosine Mutants of Arabidopsis Phytochromes in Transgenic Plants
PLANT CELL,
July 1, 2007;
19(7):
2124 - 2139.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Menand, K. Yi, S. Jouannic, L. Hoffmann, E. Ryan, P. Linstead, D. G. Schaefer, and L. Dolan
An Ancient Mechanism Controls the Development of Cells with a Rooting Function in Land Plants
Science,
June 8, 2007;
316(5830):
1477 - 1480.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Chen and J. M. Lopes
Multiple Basic Helix-Loop-Helix Proteins Regulate Expression of the ENO1 Gene of Saccharomyces cerevisiae
Eukaryot. Cell,
May 1, 2007;
6(5):
786 - 796.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. A. Rampey, A. W. Woodward, B. N. Hobbs, M. P. Tierney, B. Lahner, D. E. Salt, and B. Bartel
An Arabidopsis Basic Helix-Loop-Helix Leucine Zipper Protein Modulates Metal Homeostasis and Auxin Conjugate Responsiveness
Genetics,
December 1, 2006;
174(4):
1841 - 1857.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Khanna, Y. Shen, G. Toledo-Ortiz, E. A. Kikis, H. Johannesson, Y.-S. Hwang, and P. H. Quail
Functional Profiling Reveals That Only a Small Number of Phytochrome-Regulated Early-Response Genes in Arabidopsis Are Necessary for Optimal Deetiolation
PLANT CELL,
September 1, 2006;
18(9):
2157 - 2171.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Li, X. Duan, H. Jiang, Y. Sun, Y. Tang, Z. Yuan, J. Guo, W. Liang, L. Chen, J. Yin, et al.
Genome-Wide Analysis of Basic/Helix-Loop-Helix Transcription Factor Family in Rice and Arabidopsis
Plant Physiology,
August 1, 2006;
141(4):
1167 - 1184.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. A. Salome, J. P.C. To, J. J. Kieber, and C. R. McClung
Arabidopsis Response Regulators ARR3 and ARR4 Play Cytokinin-Independent Roles in the Control of Circadian Period
PLANT CELL,
January 1, 2006;
18(1):
55 - 69.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Sessa, M. Carabelli, M. Sassi, A. Ciolfi, M. Possenti, F. Mittempergher, J. Becker, G. Morelli, and I. Ruberti
A dynamic balance between gene activation and repression regulates the shade avoidance response in Arabidopsis
Genes & Dev.,
December 1, 2005;
19(23):
2811 - 2815.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. P. Hazen, T. F. Schultz, J. L. Pruneda-Paz, J. O. Borevitz, J. R. Ecker, and S. A. Kay
LUX ARRHYTHMO encodes a Myb domain protein essential for circadian rhythms
PNAS,
July 19, 2005;
102(29):
10387 - 10392.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Guo, K. He, D. Liu, S. Bai, X. Gu, L. Wei, and J. Luo
DATF: a database of Arabidopsis transcription factors
Bioinformatics,
May 15, 2005;
21(10):
2568 - 2569.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Kiba, T. Naitou, N. Koizumi, T. Yamashino, H. Sakakibara, and T. Mizuno
Combinatorial Microarray Analysis Revealing Arabidopsis Genes Implicated in Cytokinin Responses through the His->Asp Phosphorelay Circuitry
Plant Cell Physiol.,
February 1, 2005;
46(2):
339 - 355.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Iida, M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki
RARTF: Database and Tools for Complete Sets of Arabidopsis Transcription Factors.
DNA Res,
January 1, 2005;
12(4):
247 - 256.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Bauer, T. Thiel, M. Klatte, Z. Bereczky, T. Brumbarova, R. Hell, and I. Grosse
Analysis of Sequence, Map Position, and Gene Expression Reveals Conserved Essential Genes for Iron Uptake in Arabidopsis and Tomato
Plant Physiology,
December 1, 2004;
136(4):
4169 - 4183.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Khanna, E. Huq, E. A. Kikis, B. Al-Sady, C. Lanzatella, and P. H. Quail
A Novel Molecular Recognition Motif Necessary for Targeting Photoactivated Phytochrome Signaling to Specific Basic Helix-Loop-Helix Transcription Factors
PLANT CELL,
November 1, 2004;
16(11):
3033 - 3044.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Fujimori, T. Yamashino, T. Kato, and T. Mizuno
Circadian-Controlled Basic/Helix-Loop-Helix Factor, PIL6, Implicated in Light-Signal Transduction in Arabidopsis thaliana
Plant Cell Physiol.,
August 15, 2004;
45(8):
1078 - 1086.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Murakami, T. Yamashino, and T. Mizuno
Characterization of Circadian-Associated APRR3 Pseudo-Response Regulator Belonging to the APRR1/TOC1 Quintet in Arabidopsis thaliana
Plant Cell Physiol.,
May 1, 2004;
45(5):
645 - 650.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|