The Technical Language Processing Community of Interest (COI) is seeking to develop best practices guidelines on how to tailor Natural Language Processing (NLP) tools to engineering text-based data: technical language processing (TLP). Developed guidelines will be technology and vendor agnostic and will address the current needs of industry to have independent guidelines based on user requirements and measurement science research.
The TLP COI is seeking members from government, industry, and academia to create a better synergy between end users, the research community, and solution providers to reduce complexity, cost, and delay of adoption of TLP solutions. If you are interested in joining the COI, please email tlp-coi [at] nist.gov and provide your full name, e-mail address, company, and title. Please also provide a short description of "what do you hope to get out of working with this group?"
The TLP COI uses Slack for asynchronous communication with the group. If you would like to join the Slack workspace, please email tlp-coi [at] nist.gov to be invited to the workspace.
The TLP COI has a community built list of useful TLP resources on GitHub. The community welcomes new submissions to the list, please follow the Contribution Guidelines when submitting.
Meetings will be a mixture of working meetings and information sessions. This group will meet quarterly with a mixture of in-person and virtual meetings. Information on previous events is located on the Events page.
A workshop is being held on November 29th, 2021 at the 13th Annual Conference of the Prognostics and Health Management Society.
This Technical Language Processing (TLP) workshop will guide you through the analysis of text data and how maintenance decisions can be improved with this information. Each presenter will demo their methodology step-by-step to give an in depth look into the world of TLP! All times are EST.
10:15 – 10:30AM: Opening Remarks, Michael Brundage
10:30 – 11:15AM: A technical language processing-based solution to automatically calculating lubrication-related costs from maintenance work orders
Presenters: Michael Stewart, Melinda Hodkiewicz; University of Western Australia
Lubrication plays a critical role in the reliable functioning of rotating assets such as pumps, motors, gearboxes, compressors, fans, wheels and so on. Manufacturing sites can use over 50 lubricants on hundreds of pieces of equipment. Lapses in quality control of lubricants and lubrication systems such as filters, breathers and pumps, can lead to catastrophic failure of critical equipment. However, trying to identify how much is being spent on maintenance of the lubrication systems and the direct costs of failure requires significant manual input from subject matter experts. This is due to, for example, the myriad of ways various lubricants (grease, oil, lube etc.) are identified in historic work order data. In this presentation we demonstrate a technical language processing-based solution to this challenge. MWO2KG is an end-to-end pipeline for constructing knowledge graphs from maintenance work orders. The deep learning model behind the MWO2KG pipeline is trained on annotated data created through collaborative annotation using Redcoat, our open source web-based annotation tool, and can be interacted with by domain experts using Echidna, our open source knowledge graph visualization platform. We show how we have utilized MWO2KG to automate calculation of the direct cost of lubrication-related work orders. Direct cost is the sum of the cost on executing lubrication work identified in maintenance strategies and the cost of unplanned failures based on the time and materials involved. The MWO2KG pipeline is being used on real industry data although anonymized examples and simulated cost data are used in this example.
11:15 – 12:00PM: From data collection to decision making: a step by step tutorial using maintenance work order data
Presenters: Anna Conte, Lynn Phan, Coline Bolland, Thurston Sexton; National Institute of Standards and Technology (NIST)
Historical data analysis provides a core foundation for optimized decision-making in maintenance management. To contextualize natural language text as a data source within this process, Technical Language Processing (TLP) provides a framework for gleaning additional knowledge from this often-overlooked type of historical data. In this workshop, we focus on how the TLP paradigm informs the analysis of maintenance work orders (MWOs), which are a widely available source of data for industrial organizations to better inform their decision making. This step-by-step tutorial illustrates several applications of TLP from within different stages of the data analysis process. We present tools, schemas, and data cleaning strategies for data collection and preprocessing stage, along with a selection of Exploratory Data Analysis (EDA) and feature-selection methods relevant to text-based MWO data. More generally, we cover how to identify and mitigate common pitfalls faced by analysts when making use of MWOs for decision support data.
12:00 – 12:45PM: Lunch Break
12:45 – 1:30PM: RedShred: Extract, Enrich, and Reshape
Presenter: Jim Kukla, RedShred
The demise of paper and the rise of the paperless office has been predicted since before the invention of the fax machine. Paper’s limitations are well known to those accustomed to the tools of today’s digital world. Unfortunately, paper analogs such as PDF inherit most of paper’s weaknesses due to a dizzying variety of internal representations for equivalent visual output. Even today, paper remains the universal “lowest common denominator” format for technical reference material that is critical in maintenance operations. In this demo we introduce RedShred, a platform that enables teams to liberate document-hosted knowledge more effectively by combining computer vision and natural language processing. This platform is built on three principles: extract, enrich, and reshape. Using RedShred, teams can collaborate on reshaping valuable content that was previously trapped in paper and paper analogs. In this demonstration we will show how users can load technical documentation and configure the platform to extract and enrich the content and reformat its content for a smaller-than-printed-page interface such as the ubiquitous mobile or tablet devices carried by field service personnel. We will also show how RedShred enrichments include useful artifacts for downstream usage such as fine-tuning language models with specific kinds of content from the documents that were ingested. We also discuss the underlying principles and mental model we use to unify these capabilities into a coherent platform.
1:30 – 215PM: Topic Modeling in R
Presenter: Maria Seale; US Army Engineer Research and Development Center (ERDC)
Natural language processing techniques are often applied to labeled text data to produce numeric vectors that can inform classification models. However, a wealth of information can reside in text data that is not labeled. In these cases, statistical techniques can be used to determine groups of documents that are semantically similar, effectively “labeling” the documents and providing important information on composition and relevance. This presentation will provide a background on topic modeling and examine a use case implemented in the R programming language.
2:15PM - 3:00PM: Utilize CMMS data in practical ways despite data quality issues with Asset Answers
Presenter: Manjish Naik; GE Digital
Asset Answers is a cloud diagnostic application that eliminates poor data quality using benchmarked standards and provides continuous data improvement recommendations. The included asset performance analytics, dashboards, and reporting tools deliver accurate metrics to qualify the asset strategy, drive better reliability and make data-driven maintenance decisions. The Data Quality Module encourages accurate data entry and accountability by pinpointing the correlation between data improvement and metric impact. Asset Answers provides an accurate asset performance assessment by analyzing the Computerized Maintenance Management System (CMMS) data for completeness, accuracy, and standards. Data quality analysis provides a list of data inconsistencies and prioritized actions to effectively resolve challenges – backed by GE Digital industry leadership, equipment expertise, and performance metrics.
The TLP COI will bring together interested participants to discuss ongoing and future directions for text analysis of technical data. The output from this group will influence guidelines and roadmap documents to improve adoption of TLP solutions.
The TLP COI wants to advance research and development initiatives to advance TLP for smart manufacturing and other industrial applications. The following list defines the scope of the TLP COI's focus: