Searching for a Measure of Word Order Freedom


Tomáš Hercig and Vladislav Kuboň and Markéta Lopatková
Proceedings of the 16th ITAT: Slovenskočeský NLP workshop (SloNLP 2016) (2016)

PDF

Abstract

This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.

Authors

BibTex

@inproceedings{SLON-Kubon2016Searching, title = {Searching for a Measure of Word Order Freedom}, author = {Vladislav Kubon and Mark{\'e}ta Lopatkov{\'a} and Tom{\'a}s Hercig}, booktitle = {Proceedings of the 16th {ITAT}: Slovensko{\v{c}}esk{\'{y}} {NLP} workshop (Slo{NLP} 2016)}, editor = {Bro{\v{n}}a Brejov{\'{a}}}, year = {2016}, publisher = {CreateSpace Independent Publishing Platform}, organization = {Comenius University in Bratislava, Faculty of Mathematics, Physics and Informatics}, address = {Bratislava, Slovakia}, venue = {{SOREA} Hutn{\'{i}}k I.}, series = {{CEUR} Workshop Proceedings}, volume = {1649}, pages = {11-17}, isbn = {978-1537016740}, issn = {1613-0073}, abstract = { This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.} }
Back to Top