- © 2020 American Society of Plant Biologists. All rights reserved.
Conceived as a replacement for the anticipated retirement of The Arabidopsis Information Resource (TAIR), the Araport project was funded by the U.S. National Science Foundation (NSF) in 2013 to develop a new, extensible framework for Arabidopsis (Arabidopsis thaliana) bioinformatics that would facilitate data integration through the federation of distributed informatics resources. Recommended as a 5-year award, funds were initially provided for 2 years with a subsequent 6-month extension to allow for the development of a plan for the continuation of funding that was acceptable to NSF. When funds were exhausted in 2016 with the continuation award still awaiting a decision, the Araport site continued in a maintenance mode with minimal input from legacy personnel at each institution and with no data updates. The renewal request was finally declined in December 2018, leaving Araport with an uncertain future. In light of the critical importance of these services to the scientific community, a group of interested researchers (see Appendix) met in March 2019 to discuss options and propose a solution. A working group evolved from those in attendance at that meeting and has since met monthly to solidify and coordinate the execution of these nascent plans. The results have been encouraging and are described here to inform and inspire the larger plant science community.
Given the complete absence of external funding, it was agreed that, rather than try to perpetuate the entire Araport ecosystem, efforts should be directed toward maintaining the most attractive and most used features, namely ThaleMine and JBrowse, by transferring them to new ownership for perpetuation. Thus it was agreed that an updated version of ThaleMine would be established at the Bio-Analytic Resource (BAR) for Plant Biology at the University of Toronto under the leadership of Nicholas Provart and an updated version of JBrowse would be established as part of TAIR under the Phoenix Bioinformatics umbrella overseen by Tanya Berardini and Eva Huala.
JBROWSE
The JBrowse functionality provided by Araport has been successfully moved to TAIR. Araport had been running version 1.11.6 of the software with a set of tracks that included community submissions. Some of the tracks at the legacy location were no longer functioning after the underlying software (ADAMA) connecting them to outside resources lost support. TAIR installed the latest JBrowse version (1.16.6), replicated the tracks that were functional at Araport, and restored access to the nonfunctional tracks. In addition, two sets of newly integrated community-submitted tracks are now visible in this genome browser. One is a set of 41 tracks representing a multipronged gene expression experiment to track the response to various abiotic stresses from Lee and Bailey-Serres (2019). The other is a set of 4 tracks based on Cap Analysis of Gene Expression (CAGE) experiments to determine promoter bidirectionality performed by Thieffry et al. (2019). The CAGE data are visualized using the Stranded View plugin (Hofmeister and Schmitz, 2018), which allows separation of the display of expression values into plus and minus strands in a single track. New community tracks continue to be added, and existing track information is updated as new data become available.
THALEMINE
ThaleMine at the BAR was completely rebuilt using the latest InterMine software. The legacy ThaleMine version had not been updated since 2016 and was using the InterMine version 1.8.5, which was not forward-compatible with the latest InterMine version (4.2.0). At the time of writing, the most recent versions of publicly available data have been loaded, as listed in Table 1.
Data Sources for a New Instance of ThaleMine
As with any instance of InterMine, the BAR’s version of ThaleMine at https://bar.utoronto.ca/thalemine/ continues to support application programming interface functionalities, in addition to the extensive web-based query options. It is also compatible with the InterMine BlueGenes interface.
GENOME CONTEXT VIEWER
As part of the unsuccessful renewal proposal, some aspects of the Araport Comparative Genomics functionalities were planned to be addressed with an instance of the Genome Context Viewer (GCV) software developed at the National Center for Genome Resources (NCGR) by Andrew Farmer and Alan Cleary (Cleary and Farmer, 2018). This viewer was originally developed as part of an NSF-funded initiative for federating disparate legume-focused information resources. It provides services to enable the dynamic comparison of multiple genomes on the basis of their shared functional elements (e.g., genes) and provides an intuitive and powerful user interface for exploring similarities and differences among a set of genomic segments with respect to element content and arrangement. A version of the GCV has been installed and is now running from NCGR (https://gcv-arabidopsis.ncgr.org) as the third component of the “second-generation” Araport (see figure). This version of the GCV provides integration of the Arabidopsis Columbia reference genome (TAIR10/Araport11) with genomes from several other data sources, including two sets of newly assembled Arabidopsis genomes of various accessions (colloquially often called ecotypes) from Jiao and Schneeberger (2020) and from the 1001 Genomes project from Detlef Weigel and colleagues (Felix Bemm, Christian Kubica, and Detlef Weigel, personal communication), as well as a number of Brassicaceae genomes from Phytozome and the Brassicaceae Map Alignment Project initiative. The viewer provides convenient links to related resources for genes and genomic regions, thereby facilitating traversal into the other components of the reconfigured Araport project as well as other relevant tools. The gene family classifications utilized by the current instance are based on PANTHER 14.1 (Mi et al., 2013), and links are provided to the trees developed for these families by the PhyloGenes project (phylogenes.org).
Screenshot of New Arabidopsis GCV Showing a Region with Two Clusters of Germin-Like Proteins (PTHR31238 Gene Family, Denoted Here as Purple).
The central cluster shows extensive copy number variation among annotations from 14 Arabidopsis genomes and the closely related Arabidopsis lyrata genome (labeled araly.scaffold_7), as highlighted by the asterisks along the bottom. Other apparent copy number variations and presence/absence events can easily be observed.
LONG LIVE ARAPORT!
To establish continuity between the original Araport and these new functionalities, http://araport.org/ is now hosted at BAR and visitors are then presented with links to the new and maintained versions of ThaleMine, JBrowse, and the GCV. With these new sites operational, the original Araport site hosted at the Texas Advanced Computing Center has been shut down because of security issues related to the legacy versions of the packages used by the original site. We expect that the new Araport in its various component parts will continue to be widely used not just by Arabidopsis researchers but by the wider plant community.
In summary, a grassroots effort by committed community members has built upon the resources developed by the Araport project to provide continuity of Araport’s most used and useful features. It is gratifying to see that the vision of the 2012 white paper (International Arabidopsis Informatics Consortium, 2012) suggesting a future for Arabidopsis informatics as a community effort accomplished by a federation of independent community members has, in a modest way, come to pass. March 2020 saw 10,376 views of the ThaleMine landing page, showing a wide uptake by the community. That said, this rescue effort is not really a sustainable solution. Data curation and database maintenance are of vital importance and, notwithstanding TAIR’s successful subscription model, is something that is worthy of support by national funding agencies for the continued success of plant research in the United States and worldwide.
Acknowledgments
We are especially grateful to the scientists at the Texas Advanced Computing Center, especially Erik Ferlanti, John Fonner, and Matt Vaughn, for continuing to host and maintain Araport long after its “use-by” date. We thank Vivek Krishnakumar for providing insights and advice on the inner workings of Araport during the transition period. Araport was supported by grants from the National Science Foundation (grant DBI-1262414) and the Biotechnology and Biological Sciences Research Council (grant BB/L027151/1). Development of the GCV was supported by USDA-ARS project funding for the Legume Information System and the National Science Foundation (grant IOS-1444806 to A.F.). The J. Craig Venter Institute workshop that launched the Araport recovery effort was supported by the U.S. National Science Foundation (MCB Award 1062348, made to the U.S. members of the International Arabidopsis Informatics Consortium Steering Committee). The BAR is supported through a grant to N.P. from the National Sciences and Engineering Research Council of Canada and from Genome Canada/Ontario Genomics. TAIR is managed by the nonprofit Phoenix Bioinformatics Corporation and is supported through institutional, lab, and personal subscriptions.
APPENDIX
List of participants, “Future of Araport” meeting held at the J. Craig Venter Institute (JCVI) in Rockville, Maryland, March 25 and 26, 2019.
Tanya Berardini, Phoenix Bioinformatics
Agnes Chan, JCVI
Yongwook Choi, JCVI
Andrew Farmer, NCGR
Erik Ferlanti, Texas Advanced Computing Center
Eva Huala, Phoenix Bioinformatics
Vivek Krishnakumar, formerly JCVI
Sean May, University of Nottingham
Asher Pasha, University of Toronto
Nicholas Provart, University of Toronto
David Somers, Ohio State University
Chris Town, JCVI
Eve Wurtele, Iowa State University
Remotely via BlueGenes:
Sam Hokin, NCGR
Eric Lyons, University of Arizona
Todd Michael, JCVI
Footnotes
- Received May 11, 2020.
- Revised June 25, 2020.
- Accepted July 17, 2020.
- Published July 22, 2020.