tailieunhanh - Báo cáo khoa học: "Structural Disambiguation Based on Reliable Estimation of Strength of Association"

This paper proposes a new class-based m e t h o d to estimate the strength of association in word co-occurrence for the purpose of structural disambiguation. To deal with sparseness of data, we use a conceptual dictionary as the source for acquiring upper classes of the words related in t h e co-occurrence, and then use t-scores to determine a pair of classes to be employed for calculating the strength of association. We have applied our m e t h o d to determining dependency relations in Japanese and prepositional phrase attachments in English. . | Structural Disambiguation Based on Reliable Estimation of Strength of Association Haodong Wu Eduardo de Paiva Alves Teiji Furugori Department of Computer Science University of Electro-Communications 1-5-1 Chofugaoka Chofu Tokyo 1828585 JAPAN wu ealves furugori Abstract This paper proposes a new class-based method to estimate the strength of association in word co-occurrence for the purpose of structural disambiguation. To deal with sparseness of data we use a conceptual dictionary as the source for acquiring upper classes of the words related in the co-occurrence and then use t-scores to determine a pair of classes to be employed for calculating the strength of association. We have applied our method to determining dependency relations in Japanese and prepositional phrase attachments in English. The experimental results show that the method is sound effective and useful in resolving structural ambiguities. 1 Introduction The strength of association between words provides lexical preferences for ambiguity resolution. It is usually estimated from statistics on word co-occurrences in large corpora Kindle and Rooth 1993 . A problem with this approach is how to estimate the probability of word co-occurrences that are not observed in the training corpus. There are two main approaches to estimate the probability smoothing methods . Church and Gale 1991 Jelinek and Mercer 1985 Katz 1987 and class-based methods . Brown et al. 1992 Pereira and Tishby 1992 Resnik 1992 Yarowsky 1992 . Smoothing methods estimate the probability of the unobserved co-occurrences by using frequencies of the individual words. For exam ple when eat and bread do not co-occur the probability of eat bread would be estimated by using the frequency of eat and bread . A problem with this approach is that it pays no attention to the distributional characteristics of the individual words in question. Using this method the probability of eat bread and eat cars would become the same

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
8    127    0    18-06-2024