We aim to identify chemometric fingerprints using artificial intelligence methods that reflect the temporal and geographic history of seabirds, and by extension their environment, to better understand trends in ocean and human health.
NIST has been archiving biological and environmental specimens, including marine animal and environmental samples, for over 40 years. Currently, these biospecimens are stored at cryogenic temperatures in the NIST Biorepository located at the Hollings Marine Laboratory in Charleston, SC. Various chemical analyses of samples have also been performed and are cataloged in a corresponding database maintained by the Chemical Science Division’s (CSD) Biospecimen Science Group. This project broadly seeks to use modern advances in artificial intelligence (AI) and machine learning (ML) commonly employed by the CSD’s Chemical Informatics Group to identify scientifically relevant patterns in this chemometric data as it pertains to the health and wellbeing of both the environment and humans who interact with it.
For example, the multi-stakeholder Seabird Tissue Archival and Monitoring Project (STAMP), a part of the NIST Biorepository, has collected eggs for more than twenty years to create a geospatial and temporal record of conditions throughout areas of the northern Pacific Ocean. The contents have been processed, archived, and selected aliquots analyzed to monitor ubiquitous contaminants and other analytes as these species are consuming the similar food as humans. In some areas, eggs are used as part of subsistence diet, serving a role in nutrition for indigenous peoples. Contaminant profiles in eggs are different across species. However, eggs are often not easily identifiable at the species level unless the bird is observed sitting on the nest representing a large point of uncertainty for wildlife managers and researchers alike. To address this issue, we have employed machine learning techniques to develop a chemometric classification scheme for seabirds represented in the STAMP collection. To date, these samples are covered by more than 50,000 individual data points representing seven contaminant classes, collated into a curated chemometric database linked to data describing sample origins. Specifically, we developed models to identify a bird’s genus, species, and geographic origin using only chemometric data. Our current results suggest chemometric data, commonly generated as part of environmental monitoring efforts, likely provides sufficient information to enable identification of the genus, species, and geographic origin of tissue samples when manual identification is not possible. Future research directions include genetic verification of eggs collected as unidentified species to determine if the model correctly predicted the species.