Segment Representations in Named Entity Recognition
Text, Speech, and Dialogue (2015)
Named entitity recognition
In this paper we study the effects of various segment representations in the named entity recognition (NER) task. The segment representation is responsible for mapping multi-word entities into classes used in the chosen machine learning approach. Usually, the choice of a segment representation in the NER system is arbitrary without proper tests. Some authors presented comparisons of different segment representations such as BIO, BIEO, BILOU and usually compared only two segment representations. Our goal is to show, that the segment representation problem is more complex and that the proper selection of the best approach is not straightforward. We provide experiments with a wide set of segment representations. All the representations are tested using two popular machine learning algorithms: Conditional Random Fields and Maximum Entropy. Furthermore, the tests are done on four languages, namely English, Spanish, Dutch and Czech.