Nearly all scientists who regularly use online resources on genes, proteins and genomes make use of comparative data to advance biomedical research. For instance, researchers often make useful inferences by comparing human genes (as well as proteins, reactions, interactions, pathways, behaviors and so on) to those of more well studied, experimentally tractable "model organisms" such as the mouse. In principle, robust and flexible ECA methods can be applied to a huge range of such comparative problems, but in practice, this is difficult. Our technologies will lower the barrier to applying ECA methods. Use of these technologies by service-providers and scientific end-users will increase the value of comparative data by expanding the depth and breath of inferences made from these data.
Facilitate data interchange both for end-users and for data providers, increase scientific re-use of available (pre-computed) comparative data sets, increase representation and re-use of ECA protocols, and facilitate validation, automation and scale-up of ECA approaches.
- Develop a formal ontology for comparative analysis, and a data exchange file format
- Evaluate and refine the ontology as a tool to support interoperability of file formats and data resources
- Develop a minimal standard for representing protocols (MIAPA, Minimal Information for a Phylogenetic Analysis)
- Develop a language for representing analysis workflows
- Evaluate and refine the workflow language using MIAPA reports and captured knowledge of workflows
- Develop a semantics-based system for representing, designing, validating, re-using, and executing workflow plans.
Research activities and technical approach
Efforts began in 2006 with organizing a group of domain experts and developing a proposal for what was to become the NESCent Evolutionary Informatics ("evoinfo") working group. NESCent is an NSF-supported center focusing on synthetic and infrastructure-building activities. The evoinfo working group has met 3 times since 2007. After establishing priorities, Dr. Vos and others began to focus on an XML-based data file format, nexml. Others led by Dr. Stoltzfus focused on developing CDAO (Comparative Data Analysis Ontology). The group and its members also developed a Concept Glossary, a list of use-cases, a guide to supporting the existing NEXUS file format standard, and a White Paper on a MIAPA (Minimal Information for a Phylogenetic Analysis) standard.
The fourth meeting of the group in March, 2009, will be a "Data Resources Interoperability Hackathon" in which we work with data providers (from the scientific community) on using CDAO and nexml to exchange data.
In pursuit of the other project goals, a team led by Dr. Stoltzfus has begun a project to develop and apply a workflow system that applies advanced computer science technologies, including logical denotations for semantic transformation, a Domain-Specific Language (DSL), and automated planning technology for discovering and configuring workflows.