Heimatkunde: Dataset for Multi-Modal Historical Document Analysis


Josef Baloun and Ladislav Lenc and Jiří Martínek and Pavel Král
ICAART (2024)

PDF

Abstract

This paper introduces a novel Heimatkunde dat aset comprising printed documents in German, specifically designed for evaluating layout analysis methods with a focus on multi-modality. The dataset is openly accessible for research purposes. The study further presents baseline results for instance segmentation and multi-modal element classification. Three advanced models, Mask R-CNN, YOLOv8, and LayoutLMv3, are employed for instance segmentation, while a fusion-based model integrating BERT and various vision Transformers are proposed for multi-modal classification. Experimental findings reveal that optimal bounding box segmentation is achieved with YOLOv8 using an input image size of 1280 pixels, and the best segmentation mask is produced by LayoutLMv3 with PubLayNet weights. Moreover, the research demonstrates superior multi-modal classification results using BERT for textual and Vision Transformer for image modalities. The study concludes by suggesting the integration of the proposed models into the historical Porta fontium portal to enhance the information retrieval from historical data. © 2024 by SCITEPRESS - Science and Technology Publications, Lda.

Authors

BibTex

@inproceedings{baloun2024heimatkunde, title={Heimatkunde: Dataset for Multi-Modal Historical Document Analysis.}, author={Baloun, Josef and Honz{\'\i}k, V{\'a}clav and Lenc, Ladislav and Mart{\'\i}nek, Jir{\'\i} and Kr{\'a}l, Pavel}, booktitle={ICAART (3)}, pages={995--1001}, year={2024} }
Back to Top