Coupled Snakelets for Curled Text-Line Segmentation from Warped Document Images

Syed Saqib Bukhari, Faisal Shafait, Thomas Breuel
International Journal on Document Analysis and Recognition volume 15, Pages 1-16, Springer, 2012

Abstract:

Camera-captured, warped document images usually contain curled text-lines because of distortions caused by camera perspective view and page curl. Warped document images can be transformed into planar document images for improving optical character recognition accuracy and human readability using monocular dewarping techniques. Curled text-lines segmentation is a crucial initial step for most of the monocular dewarping techniques. Existing curled textline segmentation approaches are sensitive to geometric and perspective distortions. In this paper, we introduce a novel curled text-line segmentation algorithm by adapting active contour (snake). Our algorithm performs text-line segmentation by estimating pairs of x-line and baseline. It estimates a local pair of x-line and baseline on each connected component by jointly tracing top and bottom points of neighboring connected components, and finally each group of overlapping pairs is considered as a segmented text-line. Our algorithm has achieved curled text-line segmentation accuracy of above 95% on the DFKI-I (CBDAR 2007 dewarping contest) dataset, which is significantly better than previously reported results on this dataset.

Files:

  Bukhari-Coupled-Snakelets-IJDAR12.pdf

BibTex:

@article{ BUKH2012,
	Title = {Coupled Snakelets for Curled Text-Line Segmentation from Warped Document Images},
	Author = {Syed Saqib Bukhari and Faisal Shafait and Thomas Breuel},
	Year = {2012},
	Publisher = {Springer},
	Publisher = {15},
	Pages = {1-16},
	Journal = {International Journal on Document Analysis and Recognition}
}

     
Last modified:: 30.08.2016