Much of the mathematical literature, old and new, is online and mostly in natural-language form. Therefore, math content processing presents some of the same challenges faced in natural language processing (NLP), such as math disambiguation and math semantics determination. These challenges must be surmounted to enable more effective math knowledge management, math knowledge discovery, automated presentation-to- computation (P2C) conversion, and automated math reasoning. To meet this goal, considerable math language processing (MLP) technology is needed. This project aims to advance MLP by developing (1) a sophisticated part-of-math (POM) tagger, (2) math-sense disambiguation techniques along with supporting Machine-Learning (ML) based MLP algorithms, and (3) semantics extraction from, and enrichment of, math expressions. Specifically, the project first created an evolving tagset for math terms and expressions, and is developing a general-purpose POM tagger. The tagger works in several scans and interacts with other MLP algorithms that will be developed in this project. In the 1st scan of an input math document, each math term and some sub-expressions are tagged with two kinds of tags. The 1st kind consists of definite tags (such as operation, relation, numerator, etc.) that the tagger is certain of. The 2nd kind consists of alternative, tentative features (including alternative roles and meanings) drawn from a knowledge base that has been developed for this project. The 2nd and 3rd scan will, in conjunction with some NLP/ML-based algorithms, select the right features from among those alternative features, disambiguate the terms, group subsequences of terms into unambiguous sub-expressions and tag them, and thus derive definite unambiguous semantics of math terms and expressions. The NLP/ML-based algorithms needed for this work will These include topic modeling, context modeling, document classification, and definition-harvesting algorithms.
10th Conference on Intelligent Computer Mathematics