Structural Mixtures for Statistical Layout Analysis

Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas Breuel
Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, Nara, Japan, IEEE, 2008


A key limitation of current layout analysis methods is that they rely on many hard-coded assumptions about doc- ument layouts and can not adapt to new layouts for which the underlying assumptions are not satisfied. Another ma- jor drawback of these approaches is that they do not return confidence scores for their outputs. These problems pose major challenges in large scale digitization efforts where a large number of different layouts need to be handled and manual inspection of the results on each individual page is not feasible. This paper presents a novel statistical ap- proach to layout analysis that aims at solving the above- mentioned problems for Manhattan layouts. The presented approach models known page layouts as a structural mix- ture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with asso- ciated probabilities. First experiments on documents from the publicly available MARG dataset achieved below 5% error rate for geometric layout analysis.




@inproceedings{ SHAF2008,
	Title = {Structural Mixtures for Statistical Layout Analysis},
	Author = {Faisal Shafait and Joost van Beusekom and Daniel Keysers and Thomas Breuel},
	BookTitle = {Proceedings of the 8th IAPR International Workshop on Document Analysis Systems},
	Year = {2008},
	Publisher = {IEEE}

Last modified:: 30.08.2016