NLP group

Czert – Czech BERT-like Model for Language Representation

Jakub Sido and Ondřej Pražák and Pavel Přibáň and Miloslav Konopík
RANLP (2021)

Research topics:

Language Modeling | Sentiment Analysis | Document Classification | Named Entity Recognition

Abstract

This paper describes the training process of the first Czech monolingual language representation models based on BERT and ALBERT architectures. We pre-train our models on more than 340K of sentences, which is 50 times more than multilingual models that include Czech data. We outperform the multilingual models on 9 out of 11 datasets. In addition, we establish the new state-of-the-art results on nine datasets. At the end, we discuss properties of monolingual and multilingual models based upon our results. We publish all the pretrained and fine-tuned models freely for the research community

NLP group

Research & development

Czert – Czech BERT-like Model for Language Representation

Research topics:

Abstract

Authors

Ing. Jakub Sido, Ph.D.

Researcher

Ing. Ondřej Pražák, Ph.D.

Researcher

Ing. Pavel Přibáň, Ph.D.

Researcher

Ing. Miloslav Konopík, Ph.D.

Researcher

BibTex

Contact Us

NLP group

We offer