Many computer-based inferences in contemporary biology are comparative and would benefit from the rigorous and flexible approach of evolutionary comparative analysis (ECA), in which similarities and differences between evolved things are treated explicitly as outcomes of an evolutionary process. A practical example would be inferring regulatory sites in the human genome as slow-evolving non-coding regions in a multi-species genome comparison. Broader application of such approaches is hampered by lack of an interoperability infrastructure. We are working with domain experts to develop formal ontologies and other artifacts, such as file formats, data standards, and workflow environments, and to apply them to improving interoperability in this domain.
Nearly all scientists who regularly use online resources on genes, proteins and genomes make use of comparative data to advance biomedical research. For instance, researchers often make useful inferences by comparing human genes (as well as proteins, reactions, interactions, pathways, behaviors and so on) to those of more well studied, experimentally tractable "model organisms" such as the mouse. In principle, robust and flexible ECA methods can be applied to a huge range of such comparative problems, but in practice, this is difficult. Our technologies will lower the barrier to applying ECA methods. Use of these technologies by service-providers and scientific end-users will increase the value of comparative data by expanding the depth and breath of inferences made from these data.
Facilitate data interchange both for end-users and for data providers, increase scientific re-use of available (pre-computed) comparative data sets, increase representation and re-use of ECA protocols, and facilitate validation, automation and scale-up of ECA approaches.
Research activities and technical approach
Efforts began in 2006 with organizing a group of domain experts and developing a proposal for what was to become the NESCent Evolutionary Informatics ("evoinfo") working group. NESCent is an NSF-supported center focusing on synthetic and infrastructure-building activities. The evoinfo working group has met 3 times since 2007. After establishing priorities, Dr. Vos and others began to focus on an XML-based data file format, nexml. Others led by Dr. Stoltzfus focused on developing CDAO (Comparative Data Analysis Ontology). The group and its members also developed a Concept Glossary, a list of use-cases, a guide to supporting the existing NEXUS file format standard, and a White Paper on a MIAPA (Minimal Information for a Phylogenetic Analysis) standard.
The fourth meeting of the group in March, 2009, will be a "Data Resources Interoperability Hackathon" in which we work with data providers (from the scientific community) on using CDAO and nexml to exchange data.
In pursuit of the other project goals, a team led by Dr. Stoltzfus has begun a project to develop and apply a workflow system that applies advanced computer science technologies, including logical denotations for semantic transformation, a Domain-Specific Language (DSL), and automated planning technology for discovering and configuring workflows.
Organized and co-led NESCent Evolutionary Informatics working group (2007 on)
Recruited experts from domain of phylogenetic analysis
Assessed interoperability needs and developed proposal
Met with domain experts to develop interoperability priorities
Spawned effort to develop ontology (www.evolutionaryontology.org/)
Spawned effort to develop data format (www.nexml.org)
Maintained extensive wiki documentation (www.nescent.org/wg_evoinfo)
Co-led team implementing CDAO ontology (Prosdocimi, et al., in review)
Participated in, and helped to organize, 2006 NESCent "Phyloinformatics Hackathon" (Lapp, et al., 2007)
Participated in organizing the upcoming 2009 NESCent "Data Resource Interoperability Hackathon"
Comparative Data Analysis Ontology (CDAO), an OWL-DL ontology for evolutionary comparative analysis(www.evolutionaryontology.org/). (Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfusm 2008).
nexml: an XML data exchange format for phylogenetic analysis (www.nexml.org) (Vos, R. 2007).
Supporting NEXUS. A guide for developers to implement and to improve support for the NEXUS file format(https://www.nescent.org/wg_phyloinformatics/Supporting_NEXUS) (Stoltzfus, A., R. Vos, M. Holder, H. Lapp, S. Kosakovsky Pond, and C. Zmasek. 2007).
MIAPA: Developing a minimal reporting standard for phylogenetics(Stoltzfus, 2008), a white paper written for the National Evolutionary Synthesis Center, available at https://www.nescent.org/wg_evoinfo/MIAPA_WhitePaper.
Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfus, Initial Implementation of a Comparative Data Analysis Ontology. BMC Evol Biol, in review.
Lapp, H., S. Bala, J.P. Balhoff, A. Bouck, N. Goto, M. Holder, R. Hollan, A. Holloway, T. Katayama, P.O. Lewis, A. Mackey, B.I. Osborne, W.H. Piel, S.L. Kosakovsky Pond, A. Poon, W.G. Qiu, J.E. Stajich, A. Stoltzfus, T. Thierer, A.J. Vilella, R. Vos, C.M. Zmasek, D. Zwickl, and T.J. Vision, The 2006 NESCent Phyloinformatics Hackathon: A field report. Evolutionary Bioinformatics, 2007. 3: p. 357-366.
Hladish, T., V. Gopalan, C. Liang, W. Qiu, P. Yang, and A. Stoltzfus, Bio::NEXUS: a Perl API for the NEXUS format for comparative biological data. BMC Bioinformatics, 2007. 8: p. 191.