tailieunhanh - Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity"

Iterative bootstrapping algorithms are typically compared using a single set of handpicked seeds. However, we demonstrate that performance varies greatly depending on these seeds, and favourable seeds for one algorithm can perform very poorly with others, making comparisons unreliable. We exploit this wide variation with bagging, sampling from automatically extracted seeds to reduce semantic drift. However, semantic drift still occurs in later iterations. | Reducing semantic drift with bagging and distributional similarity Tara McIntosh and James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia tara james @ Abstract Iterative bootstrapping algorithms are typically compared using a single set of handpicked seeds. However we demonstrate that performance varies greatly depending on these seeds and favourable seeds for one algorithm can perform very poorly with others making comparisons unreliable. We exploit this wide variation with bagging sampling from automatically extracted seeds to reduce semantic drift. However semantic drift still occurs in later iterations. We propose an integrated distributional similarity filter to identify and censor potential semantic drifts ensuring over 10 higher precision when extracting large semantic lexicons. 1 Introduction Iterative bootstrapping algorithms have been proposed to extract semantic lexicons for NLP tasks with limited linguistic resources. Bootstrapping was initially proposed by Riloff and Jones 1999 and has since been successfully applied to extracting general semantic lexicons Riloff and Jones 1999 Thelen and Riloff 2002 biomedical entities Yu and Agichtein 2003 facts Paặca et al. 2006 and coreference data Yang and Su 2007 . Bootstrapping approaches are attractive because they are domain and language independent require minimal linguistic pre-processing and can be applied to raw text and are efficient enough for tera-scale extraction Pa ca et al. 2006 . Bootstrapping is minimally supervised as it is initialised with a small number of seed instances of the information to extract. For semantic lexicons these seeds are terms from the category of interest. The seeds identify contextual patterns that express a particular semantic category which in turn recognise new terms Riloff and Jones 1999 . Unfortunately semantic drift often occurs when ambiguous or erroneous terms and or patterns are introduced into and then dominate the

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.