Language modeling

Language models are crucial for many tasks in NLP. Automatic speech recognition, optical character recognition, machine translation, and other areas heavily depend on the performance of the underlying language model. The goal of a language model is to estimate the probability of a given word sequence.

Supervised machine learning has proved to be very efficient in improving the language modeling task. For example in [Brychcín and Konopík, 2011], we use information about morphology to achieve significant improvements in Czech and Slovak language.

We also investigate unsupervised ways to study morphological information. We created the new stemming tool called HPS, that prooved to be efficient in language modeling (see [Brychcín and Konopík, 2015]).

Currently, a significant attention is given to examination of unsupervised ways of improving language modeling performance. In [Brychcín and Konopík, 2014], we are first to apply semantic spaces (see the section about semantic spaces) into language modeling. We significantly improve language models of Czech, Slovak and English with no external information added. Our research is presented in a highly distinguished journal – Computer Speech and Language.

Publications

Tomáš Brychcín
Latent Tree Language Model
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
BibTex | PDF
Semantic analysis | Language modeling
Pavel Král and Ladislav Lenc and Christophe Cerisara
Semantic Features for Dialogue Act Recognition
3rd International Conference on Statistical Language and Speech Processing (SLSP 2015) (2015)
BibTex | PDF
Semantic analysis | Language modeling
Pavel Král and Christophe Cerisara
Automatic dialogue act recognition with syntactic features
Language Resources and Evaluation (2014)
BibTex | PDF
Semantic analysis | Language modeling
Tomáš Brychcín and Miloslav Konopík
Morphological Based Language Models for Inflectional Languages
Proceedings of IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems (2011)
BibTex | PDF
Language modeling
Back to Top