Classification of Journal Articles In a Search For New Experimental Thermophysical Property Data: A Case Study

Adele P. Peskin; Alden A. Dima

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Classification of Journal Articles In a Search For New Experimental Thermophysical Property Data: A Case Study

Published

June 7, 2017

Author(s)

Adele P. Peskin, Alden A. Dima

Abstract

We present a case study in which we use natural language processing and machine learning techniques to automatically select candidate scientific articles that may contain new experimental thermophysical property data from thousands of articles available in five different relevant journals. The National Institute of Standards and Technology (NIST) Thermodynamic Research Center (TRC) maintains a large database of available thermophysical property data extracted from articles that are manually selected for content. Over time the number of articles requiring manual inspection has grown and assistance from machine-based methods is needed. Previous work used topic modeling along with classification techniques to classify these journal articles into those with data for the TRC database and those without. These techniques have produced classifications with accuracy between 85 and 90%. However, the TRC does not want to lose data from the misclassified articles that contain relevant information. In this study, we start with these topic modeling and classification techniques, and then enhance the model using information relevant to the TRC's selection process. Our goal is to minimize the number of articles that require manual selection without missing articles of importance. Through a series of selection methods, we eliminate those articles for which we can determine a rejection criterion. We are able to reduce the number of articles that are not of interest by 70.8% while retaining 98.7% of the articles of interest. We have also found that topic model classification improves when the corpus of words is derived from specific sections of the articles rather than the entire articles, and we improve on our classification by using a combination of topic models from different sections of the article. Our best classification used only the Experimental and Literature Cited sections.

Citation

Integrating Materials and Manufacturing Innovation

Pub Type

Journals

Keywords

Topic models, association rules, natural language processing

Data and informatics

Citation

Peskin, A. and Dima, A. (2017), Classification of Journal Articles In a Search For New Experimental Thermophysical Property Data: A Case Study, Integrating Materials and Manufacturing Innovation (Accessed October 10, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 7, 2017, Updated September 24, 2018

Was this page helpful?

Classification of Journal Articles In a Search For New Experimental Thermophysical Property Data: A Case Study

Author(s)

Abstract

Keywords

Citation

Additional citation formats

Issues