OCR-Free Table of Contents Detection in Urdu Books

Adnan Ul Hasan, Syed Saqib Bukhari, Faisal Shafait, Thomas Breuel
IAPR International Workshop on Document Analysis Systems, Gold Coast, Queensland, Australia, IEEE, 3/2012

Abstract:

Table of Contents (ToC) is an integral part of multiple-page documents like books, magazines, etc. Most of the existing techniques use textual similarity for automatically detecting ToC pages. However, such techniques may not be applied for detection of ToC pages in situations where OCR technology is not available, which is indeed true for historical documents and many modern Nabataean (Arabic) and Indic scripts. It is, therefore, necessary to develop tools to navigate through such documents without the use of OCR. This paper reports a preliminary effort to address this challenge. The proposed algorithm has been applied to find Table of Contents (ToC) pages in Urdu books and an overall initial accuracy of 88% has been achieved

Files:

  Adnan-Urdu-Book-ToC-DAS12.pdf

BibTex:

@inproceedings{ HASA2012,
	Title = {OCR-Free Table of Contents Detection in Urdu Books},
	Author = {Adnan Ul Hasan and Syed Saqib Bukhari and Faisal Shafait and Thomas Breuel},
	BookTitle = {IAPR International Workshop on Document Analysis Systems},
	Month = {3},
	Year = {2012},
	Publisher = {IEEE}
}

     
Last modified:: 30.08.2016