Layout Analysis using OCRopus

Here we provide an on-line demo of our layout analysis algorithm used in the OCRopus open source OCR system. For detailed information about the algorithm, please refer to:

T. M. Breuel: High Performance Document Layout Analysis, Symposium on Document Image Understanding Technology, Greenbelt, Maryland, 2003.

Notes:

You can either submit an image through the form interface, or you can submit it programmatically through HTTP. You can also submit a PDF document, in which case the first page will be rendered at 200dpi and then used.

Form Interface

File (max. 5MB):

If you do not have an image at hand or want to try some of our images, try one of these (note that results are cached, so this is faster than using a new image):

Programmatic Interface

To submit your image programmatically, you can simply POST to this URL; the image should be a parameter named "imagefile".

From the command line, you can do this using:

curl -D header.out -F 'imagefile=@input.jpg;type=image/jpeg' http://demo-madm.dfki.uni-kl.de/layout/ > output.png

You can also do this easily using the HTTP implementation in your favorite programming language (C#, Python, Java, Perl, etc.).