tailieunhanh - Báo cáo khoa học: "Comparison and Classification of Dialects"

This project measures and classifies language variation. In contrast to earlier dialectology, we seek a comprehensive characterization of (potentially gradual) differences between dialects, rather than a geographic delineation of (discrete) features of individual words or pronunciations. More general characterizations of dialect differences then become available. We measure phonetic (un)relatedness between dialects using Levenshtein distance, and classify by clustering distances but also by analysis through multidimensional scaling. . | Proceedings of EACL 99 Comparison and Classification of Dialects John Nerbonne and Wilbert Heeringa and Peter Kleiweg Alfa-informatica BCN University of Groningen 9700 AS Groningen The Netherlands nerbonne freeringa kleiweg @ Abstract . This project measures and classifies language variation. In contrast to earlier dialectology we seek a comprehensive characterization of potentially gradual differences between dialects rather than a geographic delineation of discrete features of individual words or pronunciations. More general characterizations of dialect differences then become available. We measure phonetic un relatedness between dialects using Levenshtein distance and classify by clustering distances but also by analysis through multidimensional scaling. 1 Data and Method Data is from Reeks Nederlands ch e Dialectatlassen Blancquaert and Pée 1925 1982 . It contains 1 956 Netherlandic and North Belgian transcriptions of 141 sentences. We chose 104 dialects regularly scattered over the Dutch language area and 100 words which appear in each dialect text and which contain all vowels and consonants. Comparison is based on Levenshtein distance a sequence-processing algorithm which speech recognition has also used Kruskal 1983 . It calculates the cost of changing one word into another using insertions deletions and replacements. L-distance si S2 is the sum of the costs of the cheapest set of operations changing 1 to s2. soaglrl delete r 1 soagll replace 1 0 2 S39g01 insert r 1 sjragpl___________________ Sum distance 4 The example above illustrates Levenstein distance applied to Bostonian and standard American pronunciations of saw a girl. Kessler 1995 applied Levenshtein distance to Irish dialects. The ex ample simplifies our procedure for clarity refinements due to feature sensitivity are omitted. To obtain the results below costs are refined based on phonetic feature overlap. Replacement costs vary depending on the phones involved. Different feature systems

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.