Document Image Coding for Processing and Retrieval
O E. Kia, D Doermann
Document images belong to a unique class of images where the information content is contained in the language represented by a series of symbols on the page rather than in the visual objects themselves. From this, a new image coding strategy can be designed so as to address some compression and retrieval issues. In this paper we describe a coding methodology that not only exploits component-level redundancy to reduce code length but also expedites efficient data access. The approach uses an image pattern approach that captures image redundancy while providing a natural information index. The approach identifies patterns which appear repeatedly, represents similar patterns with a single prototype, stores the location of pattern instances, and codes the residuals between the prototypes and the pattern instances. Compression results are somewhat competitive but compressed-domain access is clearly superior to competing methods. Furthermore, applications to network-related problems have been considered which show favorable results.
and Doermann, D.
Document Image Coding for Processing and Retrieval, Journal of Vlsi Signal Processing, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=150716
(Accessed February 21, 2024)