Optical Character Recognition on supermarket receipts

Marco Ziegaus

@misc{ziegaus2016optical,
  title={Optical Character Recognition on supermarket receipts},
  author={Ziegaus, Marco},
  year={2016}
}

Receipt detection	Receipt localization	Receipt normalization	Text line segmentation	Optical character recognition	Semantic analysis
❌	❌	✔️	✔️	❗	✔️

Character segmentation with monospace strategy
Recognizing every character one by one:
- centering of the character in the image
- generation of templates (how different characters looks - mean)
- template matching
- reliability prediction
Simple autocorrection - correcting errors with regexps; ideas: keyword dictionary, products database, plausibility validation

Fields extracted:
- total price
- cash given
- cash drawback
- list of items:
  - product name
  - quantity
  - price per piece
  - total price for product
- date
- time
- store tax id
keyword based (levensthein distance) and regular expression based data extraction