Font group identification using reconstructed fonts

Michael P. Cutter, Joost van Beusekom, Faisal Shafait, Thomas Breuel
SPIE Document Recognition and Retrieval XVIII, San Francisco, CA, USA, SPIE, 1/2011


An accessible digital document should be searchable, compressed, highly readable and faithful to the original. These requirements can be achieved by digitally recreating the document with embedded fonts; however, it is not always known what fonts were used to author the original document. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of characters that form a font. In this work we address the assignment of every character within the document to a font cluster, which is necessary to represent a scanned document image with a reconstructed font. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. Through our evaluation method, the algorithm's font cluster assignment accuracy is measured to be 96% on multi-font documents.




