tailieunhanh - Báo cáo khoa học: "SVD and Clustering for Unsupervised POS Tagging"

We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. | SVD and Clustering for Unsupervised POS Tagging Michael Lamar Division of Applied Mathematics Brown University Providence RI USA mlamar@ Mark Johnson Department of Computing Faculty of Science Macquarie University Sydney Australia mjohnson@ Yariv Maron Gonda Brain Research Center Bar-Ilan University Ramat-Gan Israel syarivm@ Elie Bienenstock Division of Applied Mathematics and Department of Neuroscience Brown University Providence RI USA elie@ Abstract We revisit the algorithm of Schutze 1995 for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also produce a range of finer-grained taggings with potential applications to various tasks. 1 Introduction While supervised approaches are able to solve the part-of-speech POS tagging problem with over 97 accuracy Collins 2002 Toutanova et al. 2003 unsupervised algorithms perform considerably less well. These models attempt to tag text without resources such as an annotated corpus a dictionary etc. The use of singular value decomposition SVD for this problem was introduced in Schutze 1995 . Subsequently a number of methods for POS tagging without a dictionary were examined . by Clark 2000 Clark 2003 Haghighi and Klein 2006 Johnson 2007 Goldwater and Griffiths 2007 Gao and Johnson 2008 and Graẹa et al. 2009 . The latter two using Hidden Markov Models HMMs exhibit the highest performances to These authors contributed equally. date for fully unsupervised POS tagging. The revisited SVD-based approach presented here which we call two-step SVD or SVD2 has four important characteristics. First it achieves state-of-the-art tagging accuracy. Second it requires drastically less computational effort than the best currently available .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.