Structural Mixtures for Statistical Layout Analysis

Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas Breuel
Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, Nara, Japan, IEEE, 2008

Abstract:

A key limitation of current layout analysis methods is that they rely on many hard-coded assumptions about doc- ument layouts and can not adapt to new layouts for which the underlying assumptions are not satisfied. Another ma- jor drawback of these approaches is that they do not return confidence scores for their outputs. These problems pose major challenges in large scale digitization efforts where a large number of different layouts need to be handled and manual inspection of the results on each individual page is not feasible. This paper presents a novel statistical ap- proach to layout analysis that aims at solving the above- mentioned problems for Manhattan layouts. The presented approach models known page layouts as a structural mix- ture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with asso- ciated probabilities. First experiments on documents from the publicly available MARG dataset achieved below 5% error rate for geometric layout analysis.

Files:

  abs_all.jsp
  2008-IUPR-07Aug_0828.pdf

BibTex:

@inproceedings{ SHAF2008,
	Title = {Structural Mixtures for Statistical Layout Analysis},
	Author = {Faisal Shafait and Joost van Beusekom and Daniel Keysers and Thomas Breuel},
	BookTitle = {Proceedings of the 8th IAPR International Workshop on Document Analysis Systems},
	Year = {2008},
	Publisher = {IEEE}
}

     
Last modified:: 30.08.2016