tailieunhanh - Báo cáo khoa học: "Measuring Syntactic Difference in British English"
Recent work by Nerbonne and Wiersma (2006) has provided a foundation for measuring syntactic differences between corpora. It uses part-of-speech trigrams as an approximation to syntactic structure, comparing the trigrams of two corpora for statistically significant differences. This paper extends the method and its application. It extends the method by using leafpath ancestors of Sampson (2000) instead of trigrams, which capture internal syntactic structure—every leaf in a parse tree records the path back to the root. The corpus used for testing is the International Corpus of English, Great Britain (Nelson et al., 2002), which contains syntactically annotated speech of. | Measuring Syntactic Difference in British English Nathan C. Sanders Department of Linguistics Indiana University Bloomington IN 474o5 USA ncsander@ Abstract Recent work by Nerbonne and Wiersma 2006 has provided a foundation for measuring syntactic differences between corpora. It uses part-of-speech trigrams as an approximation to syntactic structure comparing the trigrams of two corpora for statistically significant differences. This paper extends the method and its application. It extends the method by using leafpath ancestors of Sampson 2000 instead of trigrams which capture internal syntactic structure every leaf in a parse tree records the path back to the root. The corpus used for testing is the International Corpus of English Great Britain Nelson et al. 2002 which contains syntactically annotated speech of Great Britain. The speakers are grouped into geographical regions based on place of birth. This is different in both nature and number than previous experiments which found differences between two groups of Norwegian L2 learners of English. We show that dialectal variation in eleven British regions from the ICE-GB is detectable by our algorithm using both leaf-ancestor paths and trigrams. 1 Introduction In the measurement of linguistic distance older work such as Seguy 1973 was able to measure distance in most areas of linguistics such as phonology morphology and syntax. The features used for comparison were hand-picked based on linguistic knowledge of the area being surveyed. These features while probably lacking in completeness of coverage certainly allowed a rough comparison of distance in all linguistic domains. In contrast computational methods have focused on a single area of language. For example a method for determining phonetic distance is given by Heeringa 2004 . Heeringa and others have also done related work on phonological distance in Nerbonne and Heeringa 1997 and Gooskens and Heeringa 2004 . A measure of syntactic distance is the .
đang nạp các trang xem trước