Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary"

Thành Lợi 64 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words as loanwords. In addition, we correspond the extracted loanwords to Japanese words and produce a bilingual dictionary. We propose a stemming method for Mongolian to extract loanwords correctly. We verify the effectiveness of our methods experimentally. . | Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary Badam-Osor Khaltar Graduate School of Library Information and Media Studies University of Tsukuba 1-2 Kasuga Tsukuba 305-8550 Japan khab23@slis.tsukuba.ac.jp Atsushi Fujii Graduate School of Library Information and Media Studies University of Tsukuba 1-2 Kasuga Tsukuba 305-8550 Japan fujii@slis.tsukuba.ac.jp Tetsuya Ishikawa The Historiographical Institute The University of Tokyo 3-1 Hongo 7-chome Bunkyo-ku Tokyo 133-0033 Japan ishikawa@hi.u-tokyo.ac.jp Abstract This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words as loanwords. In addition we correspond the extracted loanwords to Japanese words and produce a bilingual dictionary. We propose a stemming method for Mongolian to extract loanwords correctly. We verify the effectiveness of our methods experimentally. 1 Introduction Reflecting the rapid growth in science and technology new words and technical terms are being progressively created and these words and terms are often transliterated when imported as loanwords in another language. Loanwords are often not included in dictionaries and decrease the quality of natural language processing information retrieval machine translation and speech recognition. At the same time compiling dictionaries is expensive because it relies on human introspection and supervision. Thus a number of automatic methods have been proposed to extract loanwords and their translations from corpora targeting various languages. In this paper we focus on extracting loanwords in Mongolian. The Mongolian language is divided into Traditional Mongolian written using the Mongolian alphabet and Modern

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Extracting and modeling durations for habits and events from Twitter"

Báo cáo khoa học: "Extracting Narrative Timelines as Temporal Dependency Structures"

Báo cáo khoa học: "Extracting Social Networks from Literary Fiction"

Báo cáo khoa học: "Extracting Paraphrases from Deﬁnition Sentences on the Web"

Báo cáo khoa học: "Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification"

Báo cáo khoa học: "Hierarchical Sequential Learning for Extracting Opinions and their Attributes"

Báo cáo khoa học: "Extracting Sequences from the Web"

Báo cáo khoa học: "Extracting Opinion Expressions and Their Polarities – Exploration of Pipelines and Joint Models"

Báo cáo khoa học: "Extracting and Classifying Urdu Multiword Expressions"

Báo cáo khoa học: "A Latent Topic Extracting Method based on Events in a Document and its Application"