Named Entity Recognition for Highly Inflectional Languages: Effects of Various Lemmatization and Stemming Approaches


Michal Konkol and Miloslav Konopík
Text, Speech and Dialogue (2014)

PDF

Research topics:

Named Entity Recognition

Abstract

In this paper, we study the effects of various lemmatization and stemming approaches on the named entity recognition (NER) task for Czech, a highly inflectional language. Lemmatizers are seen as a necessary component for Czech NER systems and they were used in all published papers about Czech NER so far. Thus, it has an utmost importance to explore their benefits, limits and differences between simple and complex methods. Our experiments are evaluated on the standard Czech Named Entity Corpus 1.1 as well as the newly created 2.0 version.

Authors

BibTex

@inproceedings{jbibtex-1, year = {2014}, isbn = {978-3-319-10815-5}, booktitle = {Text, Speech and Dialogue}, volume = {8655}, series = {Lecture Notes in Computer Science}, editor = {Sojka, Petr and Horák, Aleš and Kopeček, Ivan and Pala, Karel}, doi = {10.1007/978-3-319-10816-2_33}, title = {Named Entity Recognition for Highly Inflectional Languages: Effects of Various Lemmatization and Stemming Approaches}, url = {http://dx.doi.org/10.1007/978-3-319-10816-2_33}, publisher = {Springer International Publishing}, keywords = {Named Entity Recognition; Lemmatization; Stemming}, author = {Konkol, Michal and Konopík, Miloslav}, pages = {267-274}, language = {English}, abstract = {In this paper, we study the effects of various lemmatization and stemming approaches on the named entity recognition (NER) task for Czech, a highly inflectional language. Lemmatizers are seen as a necessary component for Czech NER systems and they were used in all published papers about Czech NER so far. Thus, it has an utmost importance to explore their benefits, limits and differences between simple and complex methods. Our experiments are evaluated on the standard Czech Named Entity Corpus 1.1 as well as the newly created 2.0 version.} }
Back to Top