Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Memory-Based Learning: Using Similarity for Smoothing"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POStagging. . | Memory-Based Learning Using Similarity for Smoothing Jakub Zavrel and Walter Daelemans Computational Linguistics Tilburg University PO Box 90153 5000 LE Tilburg The Netherlands zavrel walter @kub.nl Abstract This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains and allows the easy integration of diverse information sources such as rich lexical representations. 1 Introduction Statistical approaches to disambiguation offer the advantage of making the most likely decision on the basis of available evidence. For this purpose a large number of probabilities has to be estimated from a training corpus. However many possible conditioning events are not present in the training data yielding zero Maximum Likelihood ML estimates. This motivates the need for smoothing methods which reestimate the probabilities of low-count events from more reliable estimates. Inductive generalization from observed to new data lies at the heart of machine-learning approaches to disambiguation. In Memory-Based Learning1 MBL induction is based on the use of similarity Stanfill Waltz 1986 Aha et al. 1991 Cardie 1994 Daelemans 1995 . In this paper we describe how the use of similarity between patterns embodies a solution to the sparse data problem how it 1The Approach is also referred to as Case-based Instance-based or Exemplar-based. relates to backed-off smoothing methods and what advantages it offers when combining diverse and rich