tailieunhanh - Báo cáo khoa học: "Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections"
We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model (BergKirkpatrick et al., 2010). . | Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections Dipanjan Das Carnegie Mellon University Pittsburgh PA 15213 USA dipanjan@ Slav Petrov Google Research New York NY 10011 USA slav@ Abstract We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language in particular no tagging dictionary is assumed making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model Berg-Kirkpatrick et al. 2010 . Across eight European languages our approach results in an average absolute improvement of over a state-of-the-art baseline and over vanilla hidden Markov models induced with the Expectation Maximization algorithm. 1 Introduction Supervised learning approaches have advanced the state-of-the-art on a variety of tasks in natural language processing resulting in highly accurate systems. Supervised part-of-speech POS taggers for example approach the level of inter-annotator agreement Shen et al. 2007 accuracy for English . However supervised methods rely on labeled training data which is time-consuming and expensive to generate. Unsupervised learning approaches appear to be a natural solution to this problem as they require only unannotated text for train This research was carried out during an internship at Google Research. 600 ing models. Unfortunately the best completely unsupervised English POS tagger that does not make use of a tagging dictionary reaches only accuracy Christodoulopoulos et al. 2010 making its practical usability questionable at best. To bridge this gap we consider a practically motivated scenario in which we want to leverage existing resources from a resource-rich language like
đang nạp các trang xem trước