Multi-label Document Classification in Czech

Michal Hrala and Pavel Král
16th International conference on Text, Speech and Dialogue (TSD 2013) (2013)


Research topics:

Document Classification


This paper deals with multi-label automatic document classification in the context of a real application for the Czech news agency. The main goal of this work is to compare and evaluate three most promising multi-label document classification approaches on a Czech language. We show that the simple method based on a meta-classifier proposes by Zhu at al. outperforms significantly the other approaches. The classification error rate improvement is about 13%. The Czech document corpus is available for research purposes for free which is another contribution of this work.



@InProceedings{Kral13TSD, author = {Hrala, M. and Kr\'al, P.}, title = {Multi-label Document Classification in {C}zech}, booktitle = {16th International conference on Text, Speech and Dialogue (TSD 2013)}, pages = {343-351}, year = {2013}, address = {Pilsen, Czech Republic}, month = {1-5 September}, publisher = {Springer}, doi = {10.1007/978-3-642-40585-3\_44}, isbn = {978-3-642-40584} }
