NLP group

Training Strategies for OCR Systems for Historical Documents

Jiří Martínek and Ladislav Lenc and Pavel Král
Artificial Intelligence Applications and Innovations (2019)

Research topics:

Dialogue Act Recognition

Abstract

This paper presents an overview of training strategies for optical character recognition of historical documents. The main issue is the lack of the annotated data and its quality. We summarize several ways of synthetic data preparation. The main goal of this paper is to show and compare possibilities how to train a convolutional recurrent neural network classifier using the synthetic data and its combination with a real annotated dataset.

Authors

Ing. Jiří Martínek, Ph.D.

Researcher

jimar@kiv.zcu.cz
More details

Ing. Ladislav Lenc, Ph.D.

Researcher

llenc@kiv.zcu.cz
More details

prof. Ing. Pavel Král, Ph.D.

Team leader

pkral@kiv.zcu.cz
More details

BibTex

@InProceedings{10.1007/978-3-030-19823-7_30, author = "Mart{\'i}nek, Ji{\v{r}}{\'i} and Lenc, Ladislav and Kr{\'a}l, Pavel", editor = "MacIntyre, John and Maglogiannis, Ilias and Iliadis, Lazaros and Pimenidis, Elias", title = "Training Strategies for OCR Systems for Historical Documents", booktitle = "Artificial Intelligence Applications and Innovations", month = "24-26 May", year = "2019", publisher = "Springer International Publishing", address = "Cham", pages = "362--373", doi = "10.1007/978-3-030-19823-7_30", abstract = "This paper presents an overview of training strategies for optical character recognition of historical documents. The main issue is the lack of the annotated data and its quality. We summarize several ways of synthetic data preparation. The main goal of this paper is to show and compare possibilities how to train a convolutional recurrent neural network classifier using the synthetic data and its combination with a real annotated dataset.", isbn = "978-3-030-19823-7" }

NLP group

Research & development

Training Strategies for OCR Systems for Historical Documents

Research topics:

Abstract

Authors

Ing. Jiří Martínek, Ph.D.

Researcher

Ing. Ladislav Lenc, Ph.D.

Researcher

prof. Ing. Pavel Král, Ph.D.

Team leader

BibTex

Contact Us

NLP group

We offer