tailieunhanh - Báo cáo khoa học: "The Same-head Heuristic for Coreference"

We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision. | The Same-head Heuristic for Coreference Micha Elsner and Eugene Charniak Brown Laboratory for Linguistic Information Processing BLLIP Brown University Providence RI 02912 melsner ec @ Abstract We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent- but this is not always true especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features improving precision. 1 Introduction Full NP coreference the task of discovering which non-pronominal NPs in a discourse refer to the same entity is widely known to be challenging. In practice however most work focuses on the subtask of linking NPs with different head words. Decisions involving NPs with the same head word have not attracted nearly as much attention and many systems especially unsupervised ones operate under the assumption that all same-head pairs corefer. This is by no means always the case-there are several systematic exceptions to the rule. In this paper we show that these exceptions are fairly common and describe an unsupervised system which learns to distinguish them from coreferent same-head pairs. There are several reasons why relatively little attention has been paid to same-head pairs. Primarily this is because they are a comparatively easy subtask in a notoriously difficult area Stoyanov et al. 2009 shows that among NPs headed by common nouns those which have an exact match earlier in the document are the easiest to resolve variant MUC score .82 on MUC-6 and while those with partial matches are quite a bit harder .53 by far the worst performance is on those without any match at all .27 . This effect is magnified by most popular metrics for coreference which reward finding links within large clusters more than they .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN