tailieunhanh - Báo cáo khoa học: "Annealing Techniques for Unsupervised Statistical Language Learning"
Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the ExpectationMaximization algorithm (Dempster et al., 1977). | Annealing Techniques for Unsupervised Statistical Language Learning Noah A. Smith and Jason Eisner Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 USA nasmith jason @ Abstract Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing Rose et al. 1990 as an appealing alternative to the ExpectationMaximization algorithm Dempster et al. 1977 . Seeking to avoid search error DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant skewed DA which can incorporate a good initializer when it is available and show significant improvements over EM on a grammar induction task. 1 Introduction Unlabeled data remains a tantalizing potential resource for NLP researchers. Some tasks can thrive on a nearly pure diet of unlabeled data Yarowsky 1995 Collins and Singer 1999 Cucerzan and Yarowsky 2003 . But for other tasks such as machine translation Brown et al. 1990 the chief merit of unlabeled data is simply that nothing else is available unsupervised parameter estimation is notorious for achieving mediocre results. The standard starting point is the ExpectationMaximization EM algorithm Dempster et al. 1977 . EM iteratively adjusts a model s parameters from an initial guess until it converges to a local maximum. Unfortunately likelihood functions in practice are riddled with suboptimal local maxima . Charniak 1993 ch. 7 . Moreover maximizing likelihood is not equivalent to maximizing task-defined accuracy . Merialdo 1994 . Here we focus on the search error problem. Assume that one has a model for which improving likelihood .
đang nạp các trang xem trước