Skip to main content
U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Towards Formula Concept Discovery and Recognition

Published

Author(s)

Howard S. Cohl, Philipp Scharpf, Moritz Schubotz, Bela Gipp

Abstract

Citation-based Information Retrieval (IR) methods for scientific documents have proven to be effective in academic disciplines that use many references. In science, technology, engineering, and mathematics (STEM), researchers cite less often but employ mathematical concepts to refer to prior knowledge. Our long-term goal is to generalize citation-based IR-methods and apply the generalized method to both classical references and mathematical concepts. In this paper, we restrict ourselves to mathematical formulae and define a Formula Concept Retrieval challenge with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While the former aims at the definition and exploration of a Formula Concept that names bundled equivalent representations of a formula, the latter is designed to match a given formula to a prior assigned concept ID. Moreover, we present first Machine Learning based approaches to tackle the FCD and FCR tasks, which we apply to a standardized test-collection (NTCIR arXiv dataset). Our FCD approach yields a recall of 68% for retrieving equivalent representations of frequent formulae, and 72% for extracting the formula name from the surrounding text. FCD and FCR will enable citing formulae within mathematical documents and facilitate semantic search as well as similarity computations for plagiarism detection or document recommender systems.
Proceedings Title
Proceedings of 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019).
Volume
2414
Conference Dates
July 21-25, 2019
Conference Location
Paris, -1
Conference Title
42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Keywords

Natural Language Processing, Mathematical Language Processing, Mathematical Information Retrieval, Feature Analysis, Machine Learning
Created July 24, 2019, Updated October 8, 2020