The third PASCAL Recognizing Textual Entailment Challenge (RTE-3) contained an optional task that extended the main entailment task by requiring a system to make three-way entailment decisions (entails, contradicts, neither) and to justify its response. Contradiction was rare in the RTE-3 test set, occurring in only about 10% of the cases, and systems found accurately detecting it difficult. Subsequent analysis of the results shows a test set must contain many more entailment pairs for the three-way task than the traditional two-way task to have equal confidence in system comparisons. Each of six human judges representing eventual end users rated the quality of a justification by assigning "understandability" and "correctness" scores. Ratings of the same justification across judges differed significantly, signaling the need for a better characterization of the justification task.
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008)
June 15-20, 2008
46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008)
natural language processing, textual entailment