tailieunhanh - Báo cáo khoa học: "Character-Level Dependencies in Chinese: Usefulness and Learning"

We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences, word in Chinese is not so natural a concept as in English, nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks. The first is to handle trivial character dependencies that are equally transformed from traditional word boundaries. . | Character-Level Dependencies in Chinese Usefulness and Learning HaiZhao Department of Chinese Translation and Linguistics City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong China haizhao@ Abstract We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences word in Chinese is not so natural a concept as in English nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks. The first is to handle trivial character dependencies that are equally transformed from traditional word boundaries. The second furthermore considers the case that annotated internal character dependencies inside a word are involved. Both of these results from character-level dependency parsing are positive. This study provides an alternative way to formularize basic character-and word-level representation for Chinese. 1 Introduction In many human languages word can be naturally identified from writing. However this is not the case for Chinese for Chinese is born to be written in character1 sequence rather than word sequence namely no natural separators such as blanks exist between words. As word does not appear in a natural way as most European languages2 it Character here stands for various tokens occurring in a naturally written Chinese text including Chinese charac-ter hanzi punctuation and foreign letters. However Chinese characters often cover the most part. 2Even in European languages a naive but necessary method to properly define word is to list them all by hand. Thank the first anonymous reviewer who points this fact. brings the argument about how to determine the word-hood in Chinese. Linguists views .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN