Searching for a Measure of Word Order Freedom

Tomáš Hercig and Vladislav Kuboň and Markéta Lopatková
Proceedings of the 16th ITAT: Slovenskočeský NLP workshop (SloNLP 2016) (2016)
BibTex  | PDF


This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.

