Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PROJECTS/PROGRAMS

Ontologies and Interoperability in Evolutionary Comparative Analysis

Summary

Many computer-based inferences in contemporary biology are comparative and would benefit from the rigorous and flexible approach of evolutionary comparative analysis (ECA), in which similarities and differences between evolved things are treated explicitly as outcomes of an evolutionary process. A practical example would be inferring regulatory sites in the human genome as slow-evolving non-coding regions in a multi-species genome comparison. Broader application of such approaches is hampered by lack of an interoperability infrastructure. We are working with domain experts to develop formal ontologies and other artifacts, such as file formats, data standards, and workflow environments, and to apply them to improving interoperability in this domain.

Description

Intended impact

Nearly all scientists who regularly use online resources on genes, proteins and genomes make use of comparative data to advance biomedical research. For instance, researchers often make useful inferences by comparing human genes (as well as proteins, reactions, interactions, pathways, behaviors and so on) to those of more well studied, experimentally tractable "model organisms" such as the mouse. In principle, robust and flexible ECA methods can be applied to a huge range of such comparative problems, but in practice, this is difficult. Our technologies will lower the barrier to applying ECA methods. Use of these technologies by service-providers and scientific end-users will increase the value of comparative data by expanding the depth and breath of inferences made from these data.

Objective

Facilitate data interchange both for end-users and for data providers, increase scientific re-use of available (pre-computed) comparative data sets, increase representation and re-use of ECA protocols, and facilitate validation, automation and scale-up of ECA approaches.

Goals

Develop a formal ontology for comparative analysis, and a data exchange file format
Evaluate and refine the ontology as a tool to support interoperability of file formats and data resources
Develop a minimal standard for representing protocols (MIAPA, Minimal Information for a Phylogenetic Analysis)
Develop a language for representing analysis workflows
Evaluate and refine the workflow language using MIAPA reports and captured knowledge of workflows
Develop a semantics-based system for representing, designing, validating, re-using, and executing workflow plans.

Research activities and technical approach

Efforts began in 2006 with organizing a group of domain experts and developing a proposal for what was to become the NESCent Evolutionary Informatics ("evoinfo") working group. NESCent is an NSF-supported center focusing on synthetic and infrastructure-building activities. The evoinfo working group has met 3 times since 2007. After establishing priorities, Dr. Vos and others began to focus on an XML-based data file format, nexml. Others led by Dr. Stoltzfus focused on developing CDAO (Comparative Data Analysis Ontology). The group and its members also developed a Concept Glossary, a list of use-cases, a guide to supporting the existing NEXUS file format standard, and a White Paper on a MIAPA (Minimal Information for a Phylogenetic Analysis) standard.

The fourth meeting of the group in March, 2009, will be a "Data Resources Interoperability Hackathon" in which we work with data providers (from the scientific community) on using CDAO and nexml to exchange data.

In pursuit of the other project goals, a team led by Dr. Stoltzfus has begun a project to develop and apply a workflow system that applies advanced computer science technologies, including logical denotations for semantic transformation, a Domain-Specific Language (DSL), and automated planning technology for discovering and configuring workflows.

Major Accomplishments

Organized and co-led NESCent Evolutionary Informatics working group (2007 on)
Recruited experts from domain of phylogenetic analysis
Assessed interoperability needs and developed proposal
Met with domain experts to develop interoperability priorities
Spawned effort to develop ontology (www.evolutionaryontology.org)
Spawned effort to develop data format (www.nexml.org)
Maintained extensive wiki documentation (www.nescent.org/wg_evoinfo)
Co-led team implementing CDAO ontology (Prosdocimi, et al., in review)
Participated in, and helped to organize, 2006 NESCent "Phyloinformatics Hackathon" (Lapp, et al., 2007)
Participated in organizing the upcoming 2009 NESCent "Data Resource Interoperability Hackathon"

Associated Product(s)

Comparative Data Analysis Ontology (CDAO), an OWL-DL ontology for evolutionary comparative analysis(www.evolutionaryontology.org/). (Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfusm 2008).

nexml: an XML data exchange format for phylogenetic analysis (www.nexml.org) (Vos, R. 2007).

Supporting NEXUS. A guide for developers to implement and to improve support for the NEXUS file format(https://www.nescent.org/wg_phyloinformatics/Supporting_NEXUS) (Stoltzfus, A., R. Vos, M. Holder, H. Lapp, S. Kosakovsky Pond, and C. Zmasek. 2007).

MIAPA: Developing a minimal reporting standard for phylogenetics(Stoltzfus, 2008), a white paper written for the National Evolutionary Synthesis Center, available at https://www.nescent.org/wg_evoinfo/MIAPA_WhitePaper.

Associated publications/reports

Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfus, Initial Implementation of a Comparative Data Analysis Ontology. BMC Evol Biol, 2009.

Lapp, H., S. Bala, J.P. Balhoff, A. Bouck, N. Goto, M. Holder, R. Hollan, A. Holloway, T. Katayama, P.O. Lewis, A. Mackey, B.I. Osborne, W.H. Piel, S.L. Kosakovsky Pond, A. Poon, W.G. Qiu, J.E. Stajich, A. Stoltzfus, T. Thierer, A.J. Vilella, R. Vos, C.M. Zmasek, D. Zwickl, and T.J. Vision, The 2006 NESCent Phyloinformatics Hackathon: A field report. Evolutionary Bioinformatics, 2007. 3: p. 357-366.

Hladish, T., V. Gopalan, C. Liang, W. Qiu, P. Yang, and A. Stoltzfus, Bio::NEXUS: a Perl API for the NEXUS format for comparative biological data. BMC Bioinformatics, 2007. 8: p. 191.

Bioscience, Bioinformatics, Information technology and Data and informatics

Created December 29, 2008, Updated March 26, 2025

Organizations

NIST Staff

Contact

Project Status

Completed

Funding

NESCent (www.nescent.org), an NSF-funded center, provides meeting support for the Evolutionary Informatics Working Group

CUSTOMERS/CONTRIBUTORS
/COLLABORATORS

Customers of products:

While our interoperability tools are still in development, we do not have end-user customers.
Some end-user tools have incorporated support for nexml (see www.nexml.org for a list).

Contributors or collaborators:

NSF National Evolutionary Synthesis Center (Durham, NC)
CDAO development team (New Mexico State Univ., Las Cruces; Univ. Texas at Dallas; Univ. Strasbourg, France)
Participants of the Evolutionary Informatics Working Group (UC Davis Genome Center; U. Washington, Seattle; Univ. Kansas; Antiviral Research Center, UCSD; Center for Evolutionary Functional Genomics, ASU; GlaxoSmithKline; Univ. Arizona; Univ. British Columbia; Hunter College; Univ. Edinburgh; Florida State Univ.; Univ. Ottawa; Burnham Institute for Medical Research; New Mexico State Univ., Las Cruces; Peabody Museum of Natural History, Yale Univ.; Univ. Texas at Dallas; Univ. Strasbourg, France)

Was this page helpful?