Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context



Howard S. Cohl, Moritz Schubotz, Andre Greiner Petter, Norman Meuschke, Bela Gipp, Philipp Scharpf


Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to commu- nicate information, e.g., in scientific papers, and to perform com- putations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such infor- mation between systems additionally requires conversion methods for mathematical representation formats. We analyze how the se- mantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly cre- ated test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format con- versions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical for- mat conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we an- notated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for se- mantic formula search, recommendation of mathematical content or detection of mathematical plagiarism.
Proceedings Title
Joint Conference on Digital Libraries
Conference Dates
June 3-7, 2018
Conference Location
Fort Worth, TX


MathML, goldstandard, dataset, computer algebra systems


Cohl, H. , Schubotz, M. , Greiner, A. , Meuschke, N. , Gipp, B. and Scharpf, P. (2018), Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context, Joint Conference on Digital Libraries, Fort Worth, TX, [online], (Accessed May 19, 2024)


If you have any questions about this publication or are having problems accessing it, please contact

Created April 30, 2018, Updated May 4, 2021