Demos

OCRopus - Open Source Layout Analysis and OCR
Screen OCRopus
Camera-based document capture (iDesk)
StereoBook
Document Browser
Visual document similarity search (FireWatch)
Document Image Dewarping
One-step capture and restoration (OSCAR)
Geometric Layout Analysis
Document Images to HTML
Comparison of Layout Analysis Techniques
Performance Evaluation of Layout Analysis Techniques
Geometric Matching for Arc and Line Finding
Patch-Based Object Recognition Using Geometric Matching
Document Image Retrieval based on Layout Similarity
Page Frame Detection
Document image viewing and retrieval based on special OCR (DIVER)
Image Browser
Bibliographic meta-data extraction using PFST.
Accessibility Proxy
Image-Based HTML Layout Verifier

OCRopus - Open Source Layout Analysis and OCR

To make the contents of books and other documents searchable and accessible, they need to be transformed into machine readable text, and the layout and markup of the pages need to be analyzed. Collectively, these two operations are carried out by Optical Character Recognition (OCR) systems. OCR appears to be a mature field, with many decades of research by numerous research groups invested in it. However, current commercial OCR systems still have a number of limitations for practical applications.
With OCRopus, the Image Understanding and Pattern Recognition Group (IUPR) at the German Research Center for Artificial Intelligence (DFKI) is developing an adaptive and adaptable open source OCR system for both desktop use and high-volume conversion efforts. The system incorporates state-of-the-art pattern recognition, statistical natural language processing, and image processing methods.
The goals of the project are to advance the state-of-the-art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. Unlike previous systems, we are structuring Ocropus in such a way that it will be easy to reuse by other researchers in the field. We are releasing OCRopus under the Apache 2 license with the initial release for English only, combining the Tesseract character recognizer with DFKI layout analysis. The technology preview release is downloadable from: http://www.ocropus.org/
In addition to the efforts at DFKI to advance OCRopus, we are hoping for contributions by the open source community, for example by adapting the system to additional languages.

The demo of Open Source Layout Analysis and OCR allows you to upload a scanned document image and the system will return the editable text.

Screenshot of OCRopus

IPeT - Image-based Personal Computing Technologies

Demos

OCRopus - Open Source Layout Analysis and OCR

Screen OCRopus

Camera-based document capture (iDesk)

StereoBook

Document Browser

Visual document similarity search (FireWatch)

Document Image Dewarping

One-step capture and restoration (OSCAR)

Geometric Layout Analysis

Document Images to HTML

Comparison of Layout Analysis Techniques

Performance Evaluation of Layout Analysis Techniques

Geometric Matching for Arc and Line Finding

Patch-Based Object Recognition Using Geometric Matching

Document Image Retrieval based on Layout Similarity

Page Frame Detection

Document image viewing and retrieval based on special OCR (DIVER)

Image Browser

Bibliographic meta-data extraction using PFST

Accessibility Proxy

Image-Based HTML Layout Verifier

Navigation