State-of-the-art biomolecular analysis is no longer limited to model organisms and is becoming routine in non-model organisms. Major drivers of this emerging bioanalytical capacity include increasing accessibility and quality of sequenced genomes as well as high-resolution fast-duty cycle mass spectrometers for proteomic analysis. In recent years there has been a dramatic decrease in sequencing cost along with an increase in the number of published eukaryotic genomes. Moreover, there are ongoing projects to sequence over 9,000 species (G10K and Earth Biogenome Projects). Despite this, currently many organisms do not have genome annotations available. NIST will assist or lead development of high-quality genome assemblies and gene annotations with partners, industry and other agencies (such as our efforts related to the Atlantic bottlenose dolphin).
Using comparative proteomics to evaluate a large, diverse group of non-model organisms creates unique and exciting research questions, opportunities and downstream products. Developing high-quality proteomic data for each species requires quality samples, genomic databases, acquiring data on cutting-edge mass spectrometers, and managing the data into an easily accessible and usable product. To make these results broadly applicable, initially blood will be used. Blood is typically available due to regular health monitoring, making it a readily available and rich resource. Blood also has the advantage of being proximate to most tissues, while also being relatively stable when it comes to many of the major constituents. Further, blood protein constituents cannot be predicted by mRNA transcript abundance. Using modern proteomic analysis of non-depleted serum/plasma, it is possible with two hours of instrument time to identify and provide relative quantification on 100 to 500 proteins. In order to take advantage of emerging proteomics techniques (such as data-independent acquisition), which may not be suitable for non-model organisms, NIST will be working alongside software and algorithm developers to ensure that these platforms can be used beyond human data sets. In order to compile and compare data across species, data tools will be developed to enable comparisons of homologous proteins across species. These data tools and data sets will be made publicly available on ProteomeXchange and MassIVE, as well as a web portal to aid in retrieval and species-species or protein-protein comparisons. This tool will allow researchers and comparative medicine departments to determine suitability of a comparative model beyond the presence or absence of a specific gene and will allow consideration of phenotypic backgrounds to influence research choice.
Phase 1 Goals/Milestones:
- Generate serum proteomic data from 25 different mammalian species that currently have genome annotations. One species (Atlantic bottlenose dolphin; Tursiops truncatus) will be selected to evaluate age and sex variability by analyzing 20 sera.
- Develop a data tool to humanize identifications between species. NCBI Refseq annotations already rely on homology for gene annotations, and this tool will determine approved HGNC gene names and symbols where appropriate.
- Make data publicly available on ProteomeXchange and MassIVE
- Develop a web portal to allow for easy species-species comparisons directed toward end-users that are not experts in proteomic technology.
Phase 2 Goals/Milestones:
- Generate proteomic data of serum from 50 additional mammalian species.
- Within the original 25 species, add larger sample sets from at least 10 individuals across age and sex.
- Begin generating plasma proteomes of species in phase 1
- Continued development of web portal.