Document classification

Automatic document classification becomes very important for information organization and storage because of the fast increasing amount of electronic text documents and the rapid growth of the World Wide Web.

We focus on the both single and multi-label document classification in the context of the application for the Czech News Agency (www.ctk.eu). We have proposed and implemented an experimental document classification system which uses a precise Czech document representation (lemmatization and POS tagging included) with several feature selection methods and three classifiers, namely naive Bayes, maximum entropy and support vector machines.

Publications

Ladislav Lenc and Pavel Král
Combination of Neural Networks for Multi-label Document Classification
in 22th International Conference on Applications of Natural Language to Information Systems (NLDB 2017) (2017)
BibTex | PDF
Document classification
Pavel Král
Named entities as new features for Czech document classification
15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2014) (2014)
BibTex | PDF
Document classification | Named entitity recognition
Tomáš Brychcín and Pavel Král
Novel Unsupervised Features for Czech Multi-label Document Classification
13th Mexican International Conference on Artificial Intelligence (MICAI 2014) (2014)
BibTex | PDF
Document classification | Semantic analysis
Michal Nykl
Linked Data and PageRank based classification
IADIS International Conference Theory and Practice in Modern Computing 2013 (part of MCCSIS 2013) (2013)
BibTex | Researchgate
Document classification | Semantic analysis
Michal Hrala and Pavel Král
Evaluation of the Document Classification Approaches
8th International Conference on Computer Recognition Systems (CORES 2013) (2013)
BibTex | PDF
Document classification
Michal Hrala and Pavel Král
Multi-label Document Classification in Czech
16th International conference on Text, Speech and Dialogue (TSD 2013) (2013)
BibTex | PDF
Document classification
Back to Top