Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "AN ALGORITHM FOR IDENTIFYING COGNATES BETWEEN RELATED LANGUAGES"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

The algorithm takes as only input a llst of words, preferably but not necessarily in phonemic transcription, in any two putatively related languages, and sorts it into decreasing order of probable cognatlon. The processing of a 250-1tem bilingual list takes about five seconds of CPU time on a DEC KLI091, and requires 56 pages of core memory. The algorithm is given no information w h a t s o e v e r about the phonemic transcription .used, and even though cognate i d e n t i f i c a t i o n is carried. | AN ALGORITHM FOR IDENTIFYING COGNATES BETWEEN RELATED LANGUAGES Jacques B.M. Guy Linguistics Department RSPacS Australian National University GPO Box 4 Canberra 2601 AUSTRALIA ABSTRACT The algorithm takes as only input a list of words preferably but not necessarily in phonemic transcription in any two putatively related languages and sorts it into decreasing order of probable cognation. The processing of a 250-ltem bilingual list takes about five seconds of CPU time on a DEC KL1091 and requires 56 pages of core memory. The algorithm is given no information whatsoever about the phonemic transcription .used and even though cognate identification is carried out on the basis of a context-free one-for-one matching of Individual characters its cognation decisions are bettered by a trained linguist using more information only In cases of wordlists sharing less than 40 cognates and involving complex multiple sound correspondences. I FUNDAMENTAL PROCEDURES A. Identifying Sound Correspondences Consider the following wordlist from two hypothetical Austroneslan-llke languages Titia Sese eye mata nas sea tasi sab father tama san mother mama nan tongue mimi nen shellfish slsl hehe bad sati has to stand ti se to come ma na with mi ne not sa ha Take the first word pair mata nas. We have no information about the phonetic values of their constituent characters we do not know whether the same system of transcription was used in both wordlists for all we know a might denotes a high back rounded vowel In Titla and a uvular trill In Sese. The only assumption allowed is that In each word list the same characters represent more or less the same sounds. Under this assumption the possibility that any one character of a member of a word pair may correspond to any character of the other member cannot be discarded. Thus In the pair mata nas Tltia m may correspond to Sese n a or s and so may Titia a t a and s . We summarize the evidence for these possible correspondences In an TxS matrix where