tailieunhanh - Báo cáo khoa học: "Contrastive Estimation: Training Log-Linear Models on Unlabeled Data∗"

Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. . | Contrastive Estimation Training Log-Linear Models on Unlabeled Data Noah A. Smith and Jason Eisner Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 USA nasmith jason @ Abstract Conditional random fields Lafferty et al. 2001 are quite effective at sequence labeling tasks like shallow parsing Sha and Pereira 2003 and named-entity extraction McCallum and Li 2003 . CRFs are log-linear allowing the incorporation of arbitrary features into the model. To train on unlabeled data we require unsupervised estimation methods for log-linear models few exist. We describe a novel approach contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem POS tagging given a tagging dictionary and unlabeled text contrastive estimation outperforms EM with the same feature set is more robust to degradations of the dictionary and can largely recover by modeling additional features. 1 Introduction Finding linguistic structure in raw text is not easy. The classical forward-backward and inside-outside algorithms try to guide probabilistic models to discover structure in text but they tend to get stuck in local maxima Charniak 1993 . Even when they avoid local maxima . through clever initialization they typically deviate from human ideas of whatthe right structure is Merialdo 1994 . One strategy is to incorporate domain knowledge into the model s structure. Instead of blind HMMs or PCFGs one could use models whose features This work was supported by a Fannie and John Hertz Foundation fellowship to the first author and NSFITR grant IIS-0313193 to the second author. The views expressed are not necessarily endorsed by the sponsors. The authors also thank three anonymous ACL reviewers for helpful comments colleagues at JHU CLSP especially David Smith and Roy Tromble and Miles .