Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions

Arya D. McCarthy, Adina Williams, Shijia Liu, David Yarowsky, Ryan Cotterell

Phonology, Morphology and Word Segmentation Long Paper

Zoom-10A: Nov 18, Zoom-10A: Nov 18 (01:00-02:00 UTC) [Join Zoom Meeting]

You can open the pre-recorded video in a separate window.

Abstract: A grammatical gender system divides a lexicon into a small number of relatively fixed grammatical categories. How similar are these gender systems across languages? To quantify the similarity, we define gender systems extensionally, thereby reducing the problem of comparisons between languages' gender systems to cluster evaluation. We borrow a rich inventory of statistical tools for cluster evaluation from the field of community detection (Driver and Kroeber, 1932; Cattell, 1945), that enable us to craft novel information theoretic metrics for measuring similarity between gender systems. We first validate our metrics, then use them to measure gender system similarity in 20 languages. We then ask whether our gender system similarities alone are sufficient to reconstruct historical relationships between languages. Towards this end, we make phylogenetic predictions on the popular, but thorny, problem from historical linguistics of inducing a phylogenetic tree over extant Indo-European languages. Of particular interest, languages on the same branch of our phylogenetic tree are notably similar, whereas languages from separate branches are no more similar than chance.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

Investigating representations of verb bias in neural language models
Robert Hawkins, Takateru Yamakoshi, Thomas Griffiths, Adele Goldberg,
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman,