K H. Lee, Y C. Choy, S B. Cho, Xiao Tang, V R. McCrary
With the widespread of XML documents on the Web, there is a growing interest in transforming paper-based documents into XML representations. In this paper, we present a syntactic method for logical structure analysis of documents with multiple pages and hierarchical structure. To generate a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to represent explicit knowledge about geometric characteristics and logical structure information of documents efficiently. Experimental results with 372 images scanned from technical journal documents show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents.
Document Analysis System V, Proceedings Lecture Notes in Computer Science
document image analysis, document imaging, geometric structure analysis, XML
, Choy, Y.
, Cho, S.
, Tang, X.
and Mccrary, V.
Document Reverse Engineering: From Paper to XML, Document Analysis System V, Proceedings Lecture Notes in Computer Science
(Accessed December 3, 2023)