Doctor Bills Data Set

This data set contains document images of doctor bills, genuine ones and forged ones. The forged bills were generated by students on he basis of a genuine bill. Their task was to re-engineer as close as possible the genuine bill they got using the software of their choice.

It also contains copies of genuine documents made on different copiers.

Two different bill layouts were chosen. These are called “type01” and “type02”. For “type02” the genuine documents had to be split into two subsets due to a variation in the setup during generation of these files (“type01a” and “type01b”).

The data set contains the original PDFs, the scans of print outs of the PDFs, the scans of the forged documents and the scans of the copied documents.

Overview of the number of document images per type:

Type Genuine Forged Copies
type01a 30 9 0
type01b 20 0
type02 40 12 40

You can download the data set here: Doctor bills data set (179 MByte)

Contact: Faisal Shafait, Joost van Beusekom

Example images will follow soon.

Last modified:: 22.04.2012