Enhancing Masked Language Modeling in BERT Models Using Pretrained Static Embeddings


Adam Mištera and Pavel Král
International Conference on Text, Speech, and Dialogue (2025)

PDF

Abstract

This paper explores the integration of pretrained static fastText word vectors into a simplified Transformer-based model to improve its efficiency and accuracy. Despite the fact that these embeddings have been outperformed by large models based on the Transformer architecture, they can still contribute useful linguistic information, when combined with contextual models, especially in low resource or computationally constrained environments. We demonstrate this by incorporating static embeddings directly into our own BERT\textsubscript{TINY}-based models prior to pretraining using masked language modeling. In this paper, we train the models on seven different languages covering three distinct language families. The results show that the use of static fastText embeddings in these models not only improves convergence for all tested languages, but also significantly improves their evaluation accuracy.

Authors

BibTex

@inproceedings{mivstera2025enhancing, title={Enhancing Masked Language Modeling in BERT Models Using Pretrained Static Embeddings}, author={Mi{\v{s}}tera, Adam and Kr{\'a}l, Pavel}, booktitle={International Conference on Text, Speech, and Dialogue}, pages={216--227}, year={2025}, organization={Springer} }
Back to Top