Multi-label Document Classification in Czech

Michal Hrala and Pavel Král
16th International conference on Text, Speech and Dialogue (TSD 2013) (2013)
BibTex  | PDF

Research topics

Document classification


This paper deals with multi-label automatic document classification in the context of a real application for the Czech news agency. The main goal of this work is to compare and evaluate three most promising multi-label document classification approaches on a Czech language. We show that the simple method based on a meta-classifier proposes by Zhu at al. outperforms significantly the other approaches. The classification error rate improvement is about 13%. The Czech document corpus is available for research purposes for free which is another contribution of this work.

Authors of the publication

Back to Top