Retrieving health records based on the content contained within them is a fundamental usage requirement for electronic health record management systems. Today's systems provide access based on structured fields—data elements in the record that have been coded to allow effective access. Yet the majority of the content of a record is often in the care providers' notes and other free-text fields that are not so structured. Standard text processing techniques do not work well for these fields:
- the fields seldom contain well-formed, grammatical sentences.
- the vocabulary is highly specialized with many non-word terms including abbreviations, measurements, symbols, and the like.
- the notations are frequently brief, and therefore highly elliptical, implicitly referring to various other parts of the record.
Health records will continue to have free-text fields since that is the most natural way for users to enter many kinds of information. This project looks to enable the development of technology so that records can be subsequently found based on the semantic content of free-text fields.
Through the NIST Information Technology Laboratory's Text Retrieval Conference (TREC) project, NIST is working with the research community to develop test data sets, evaluation methods, and other infrastructure to foster the development of new text processing algorithms specially designed for electronic health records.
The ability to find electronic health records by matching semantic content in free-text fields will enhance clinical care and greatly facilitate the use of medical records in applications such as medical trials and epidemiological studies.