Training Strategies for OCR Systems for Historical Documents
Jiří Martínek
and
Ladislav Lenc
and
Pavel Král
Artificial Intelligence Applications and Innovations (2019)
BibTex
|
PDF
Research topics
Dialogue act recognition
Abstract
This paper presents an overview of training strategies for optical character recognition of historical documents. The main issue is the lack of the annotated data and its quality. We summarize several ways of synthetic data preparation. The main goal of this paper is to show and compare possibilities how to train a convolutional recurrent neural network classifier using the synthetic data and its combination with a real annotated dataset.