NLP group

Semantic Space Transformations for Cross-Lingual Document Classification

Jiří Martínek and Ladislav Lenc and Pavel Král
27th International Conference on Artificial Neural Networks (ICANN 2018) (2018)

PDF

Research topics:

Semantic Analysis | Document Classification | Neural Networks

Abstract

Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method to project word vectors into a unified space. The main goal of this paper consists in evaluation of three promising transform methods on cross-lingual document classification task. We also propose, evaluate and compare two cross-lingual document classification approaches. We use popular convolutional neural network (CNN) and compare its performance with a standard maximum entropy classifier. The proposed methods are evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. We demonstrate that the results of all transformation methods are close to each other, however the orthogonal transformation gives generally slightly better results when CNN with trained embeddings is used. The experimental results also show that convolutional network achieves better results than maximum entropy classifier. We further show that the proposed methods are competitive with the state of the art.

NLP group

Research & development

Semantic Space Transformations for Cross-Lingual Document Classification

Research topics:

Abstract

Authors

Ing. Jiří Martínek, Ph.D.

Researcher

Ing. Ladislav Lenc, Ph.D.

Researcher

Doc.Ing. Pavel Král, Ph.D.

Team leader

BibTex

Contact Us

NLP group

We offer