Evaluating Reasoning Systems

Published: May 01, 2006


Conrad E. Bock, Michael Gruninger, Donald E. Libes, Joshua Lubell, Eswaran Subrahmanian


A review of the literature on evaluating reasoning systems reveals that it is a very broad area with wide variation in depth and breadth of research on metrics and tests. Consolidation is hampered by nonstandard terminology, differing methodologies, scattered application domains, unpublished algorithmic details, and the effects of domain content and context on the choice of metric and tests. The field of information metrology, which applies to reasoning as a kind of information processing, is still emerging from ad hoc experience in evaluating narrow kinds of information systems. This report begins to bring order to the area by categorizing reasoning systems according to their capabilities. The characteristics of each category can be used as a basis for evaluating and testing reasoning systems claiming to be in that category. Capabilities are analyzed along several dimensions, including representation languages, inference, and user and software interfaces. The report groups representation languages by their relation to first-order logic, and model-theoretic properties, such as soundness and completeness. Inference procedures are divided into deduction, induction, abduction, and analogical reasoning. Capabilities of user and software interfaces are described as they apply to reasoning systems. The report introduces information metrology, model theory, and inference to facilitate understanding of the reasoning categories presented. It concludes with recommendations for future work.
Citation: NIST Interagency/Internal Report (NISTIR) - 7310
Report Number:
Pub Type: NIST Pubs


reasoning categories, reasoning systems, software metrics
Created May 01, 2006, Updated November 10, 2018