A Statistical Method for Syntactic Dialectometry

dc.contributor.advisorKuebler, Sandra C
dc.contributor.authorSanders, Nathan Clark
dc.date.accessioned2010-12-13T21:01:30Z
dc.date.available2027-08-13T20:01:30Z
dc.date.available2012-03-11T00:40:15Z
dc.date.issued2010-12-13
dc.date.submitted2010
dc.descriptionThesis (Ph.D.) - Indiana University, Linguistics, 2010
dc.description.abstractThis dissertation establishes the utility and reliability of a statistical distance measure for syntactic dialectometry, expanding dialectometry's methods to include syntax as well as phonology and the lexicon. It establishes the measure's reliability by comparing its results to those of dialectology and phonological dialectometry on Swedish dialects, as well as evaluating variant parameter settings. The research questions of this dissertation are (1) whether a statistical measure of syntax for dialectometry will reproduce the results of syntactic dialectology and phonological dialectometry and (2) what parameter settings produce results most similar to dialectology's results. Statistical dialect distance is defined in two parts: a feature set that captures linguistic properties and a measure of dissimilarity that combines two sites' features into a single number. This dissertation uses feature sets from previous work: trigrams (Nerbonne & Wiersma, 2006) and leaf-ancestor paths (Sanders, 2007). In addition, it introduces two other feature sets: leaf-head paths based on dependencies and phrase-structure rules. This dissertation uses the measure R (Nerbonne & Wiersma 2006) as well as measures from information theory: Kullback-Leibler and Jensen-Shannon divergences and cosine similarity. This statistical distance is tested on the Swediasyn, a corpus of interviews recorded in villages throughout Sweden. After the distance was measured, the distances were processed and then compared with existing dialectology results. Unlike previous work, significant distances were measured between dialect corpora in this dissertation. When these distances are mapped to the geography of Sweden, they reproduce the traditional dialect regions of Sweden. There is weak correlation with geographic distance, but good agreement between dialectometric syntactic and phonological distance. Comparing specific dialect features with those of dialectology is inconclusive; better comparison methods are needed.
dc.identifier.urihttps://hdl.handle.net/2022/9729
dc.language.isoen
dc.publisher[Bloomington, Ind.] : Indiana University
dc.rightsThis work is licensed under the Creative Commons Attribution-NoDerivs 3.0 Unported (CC BY-ND 3.0) license.
dc.rights.urihttp://creativecommons.org/licenses/by-nd/3.0/
dc.subjectlinguistics
dc.subjectcomputational
dc.subjectdialectometry
dc.subjectdialectology
dc.subjectsyntax
dc.subjectswedish
dc.subject.classificationComputer Science
dc.subject.classificationLanguage, Linguistics
dc.titleA Statistical Method for Syntactic Dialectometry
dc.typeDoctoral Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sanders_indiana_0093A_10915.pdf
Size:
14.33 MB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us