Language models are crucial for many tasks in NLP. Automatic speech recognition, optical character recognition, machine translation, and other areas heavily depend on the performance of the underlying language model. The goal of a language model is to estimate the probability of a given word sequence.
Supervised machine learning has proved to be very efficient in improving the language modeling task. For example in [Brychcín and Konopík, 2011], we use information about morphology to achieve significant improvements in Czech and Slovak language.
We also investigate unsupervised ways to study morphological information. We created the new stemming tool called HPS, that prooved to be efficient in language modeling (see [Brychcín and Konopík, 2015]).
Currently, a significant attention is given to examination of unsupervised ways of improving language modeling performance. In [Brychcín and Konopík, 2014], we are first to apply semantic spaces (see the section about semantic spaces) into language modeling. We significantly improve language models of Czech, Slovak and English with no external information added. Our research is presented in a highly distinguished journal – Computer Speech and Language.