tailieunhanh - Báo cáo khoa học: "Hybrid Methods for POS Guessing of Chinese Unknown Words"

This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a particular length and POS category. By combining models that use different sources of information, the hybrid model achieves a precision of 89%, a significant improvement over the best result reported in previous studies, which was. | Hybrid Methods for POS Guessing of Chinese Unknown Words Xiaofei Lu Department of Linguistics The Ohio State University Columbus OH 43210 USA xflu@ Abstract This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type length and internal structure of unknown words and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a particular length and POS category. By combining models that use different sources of information the hybrid model achieves a precision of 89 a significant improvement over the best result reported in previous studies which was 69 . 1 Introduction Unknown words constitute a major source of difficulty for Chinese part-of-speech POS tagging yet relatively little work has been done on POS guessing of Chinese unknown words. The few existing studies all attempted to develop a unified statistical model to compute the probability of a word having a particular POS category for all Chinese unknown words Chen et al. 1997 Wu and Jiang 2000 Goh 2003 . This approach tends to miss one or more pieces of information contributed by the type length internal structure or context of individual unknown words and fails to combine the strengths of different models. The rule-based approach was rejected with the claim that rules are bound to overgenerate Wu and Jiang 2000 . In this paper we present a hybrid model that combines the strengths of a rule-based model with those of two statistical models for this task. The three models make use of different sources of information. The rule-based model is sensitive to the type length and internal structure of unknown words with overgeneration controlled by additional constraints. The two statistical models make use of contextual information and the likelihood for a character to appear in a .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.