Towards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language
Josef Steinberger and Hristo TanevProceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2015 (2015)
MEVEX is a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is an automatic evaluation of event detection and extraction systems in different languages. The event annotation follows the attached event taxonomy. There are 109 topics. Each topic contains comparable articles (source: Wikinews) from different languages. In total, there are 342 articles from 14 languages, with the best coverage of Czech and English.
Please, cite our article if you use any of the available resources.