tailieunhanh - Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification"

We have established a phonotactic language model as the solution to spoken language identification (LID). In this framework, we define a single set of acoustic tokens to represent the acoustic activities in the world’s spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoustic tokens and token n-grams in the vector space. We apply latent semantic analysis to the vectors, in the same way that it is applied in information retrieval, in order to capture salient phonotactics present in spoken. | A Phonotactic Language Model for Spoken Language Identification Haizhou Li and Bin Ma Institute for Infocomm Research Singapore 119613 hli mabin @ Abstract We have established a phonotactic language model as the solution to spoken language identification LID . In this framework we define a single set of acoustic tokens to represent the acoustic activities in the world s spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoustic tokens and token n-grams in the vector space. We apply latent semantic analysis to the vectors in the same way that it is applied in information retrieval in order to capture salient phonotactics present in spoken documents. The vector space modeling of spoken utterances constitutes a paradigm shift in LID technology and has proven to be very successful. It presents a error rate reduction over one of the best reported results on the 1996 NIST Language Recognition Evaluation database. 1 Introduction Spoken language and written language are similar in many ways. Therefore much of the research in spoken language identification LID has been inspired by text-categorization methodology. Both text and voice are generated from language dependent vocabulary. For example both can be seen as stochastic time-sequences corrupted by a channel noise. The n-gram language model has achieved equal amounts of success in both tasks . n-character slice for text categorization by language Cavnar and Trenkle 1994 and Phone Recognition followed by n-gram Language Modeling or PRLM Zissman 1996 . Orthographic forms of language ranging from Latin alphabet to Cyrillic script to Chinese characters are far more unique to the language than their phonetic counterparts. From the speech production point of view thousands of spoken languages from all over the world are phonetically articulated using only a few hundred distinctive .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.