NLP group

Word Embeddings for Multi-label Document Classification

Ladislav Lenc and Pavel Král
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (2017)

PDF

Abstract

In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initialized word vectors. We conclude that initialization does not play an important role for classification. However, learning of word vectors is crucial to obtain good results.

Authors

Ing. Ladislav Lenc, Ph.D.

Researcher

llenc@kiv.zcu.cz
More details

Doc.Ing. Pavel Král, Ph.D.

Team leader

pkral@kiv.zcu.cz
More details

BibTex

@InProceedings{lenc-kral:2017:RANLP, author = {Lenc, Ladislav and Kral, Pavel}, title = {Word Embeddings for Multi-label Document Classification}, booktitle = {Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017}, month = {September}, year = {2017}, address = {Varna, Bulgaria}, publisher = {INCOMA Ltd.}, pages = {431--437}, abstract = {In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech \v{C}TK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initialized word vectors. We conclude that initialization does not play an important role for classification. However, learning of word vectors is crucial to obtain good results.}, url = {https://doi.org/10.26615/978-954-452-049-6_057} }

NLP group

Research & development

Word Embeddings for Multi-label Document Classification

Abstract

Authors

Ing. Ladislav Lenc, Ph.D.

Researcher

Doc.Ing. Pavel Král, Ph.D.

Team leader

BibTex

Contact Us

NLP group

We offer