AUTHOR=Dunn Jonathan TITLE=Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology JOURNAL=Frontiers in Artificial Intelligence VOLUME=2 YEAR=2019 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2019.00015 DOI=10.3389/frai.2019.00015 ISSN=2624-8212 ABSTRACT=
The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in