Background Variability Modeling for Statistical Layout Analysis

Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas Breuel
Proceedings of the 19th International Conference on Pattern Recognition, Tampa, Florida, USA, IEEE, 2008

Abstract:

Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achiev- ing high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and fail when their underlying as- sumptions are not met. Also, they do not provide confidence scores for their output. These two problems limit the use- fulness of general purpose layout analysis methods in large scale applications. In this contribution, we propose a sta- tistically motivated model-based trainable layout analysis system that allows assumption-free adaptation to different layout types and produces likelihood estimates of the cor- rectness of the computed page segmentation. The perfor- mance of our approach is tested on a subset of the Google 1000 books dataset where it achieved a text line segmen- tation accuracy of 98.4% on layouts where other general- purpose algorithms failed to do a correct segmentation.

Files:

  abs_all.jsp
  2008-IUPR-07Aug_0818.pdf

BibTex:

@inproceedings{ SHAF2008,
	Title = {Background Variability Modeling for Statistical Layout Analysis},
	Author = {Faisal Shafait and Joost van Beusekom and Daniel Keysers and Thomas Breuel},
	BookTitle = {Proceedings of the 19th International Conference on Pattern Recognition},
	Year = {2008},
	Publisher = {IEEE}
}

     
Last modified:: 30.08.2016