Multi-label Document Classification in Czech
Michal Hrala and
Pavel Král
16th International conference on Text, Speech and Dialogue (TSD 2013) (2013)
PDF
Abstract
This paper deals with multi-label automatic document classification in the context of a real application for the Czech news agency. The main goal of this work is to compare and evaluate three most promising multi-label document classification approaches on a Czech language. We show that the simple method based on a meta-classifier proposes by Zhu at al. outperforms significantly the other approaches. The classification error rate improvement is about 13%. The Czech document corpus is available for research purposes for free which is another contribution of this work.