Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Improving Pronoun Resolution Using Statistics-Based Semantic Compatibility Information"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper we focus on how to improve pronoun resolution using the statisticsbased semantic compatibility information. We investigate two unexplored issues that influence the effectiveness of such information: statistics source and learning framework. Specifically, we for the first time propose to utilize the web and the twin-candidate model, in addition to the previous combination of the corpus and the single-candidate model, to compute and apply the semantic information. t | Improving Pronoun Resolution Using Statistics-Based Semantic Compatibility Information Xiaofeng Yang Jian Sut Chew Lim Tan Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613 xiaofengy sujian @i2r.a-star.edu.sg Abstract In this paper we focus on how to improve pronoun resolution using the statisticsbased semantic compatibility information. We investigate two unexplored issues that influence the effectiveness of such information statistics source and learning framework. Specifically we for the first time propose to utilize the web and the twin-candidate model in addition to the previous combination of the corpus and the single-candidate model to compute and apply the semantic information. Our study shows that the semantic compatibility obtained from the web can be effectively incorporated in the twin-candidate learning model and significantly improve the resolution of neutral pronouns. 1 Introduction Semantic compatibility is an important factor for pronoun resolution. Since pronouns especially neutral pronouns carry little semantics of their own the compatibility between an anaphor and its antecedent candidate is commonly evaluated by examining the relationships between the candidate and the anaphor s context based on the statistics that the corresponding predicate-argument tuples occur in a particular large corpus. Consider the example given in the work of Dagan and Itai 1990 1 They know full well that companies held tax money aside for collection later on the basis that the government said it 1 was going to collect it 2. Department of Computer Science National University of Singapore Singapore 117543 yangxiao tancl @comp.nus.edu.sg For anaphor it1 the candidate government should have higher semantic compatibility than money because government collect is supposed to occur more frequently than money .collect in a large corpus. A similar pattern could also be observed for it2. So far the corpus-based semantic knowledge has been successfully .