tailieunhanh - Báo cáo khoa học: "A Flexible POS Tagger Using an Automatically Acquired Language Model"
We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tested and evaluated on the WSJ corpus. . | A Flexible POS Tagger Using an Automatically Acquired Language Model Lluis Marquez LSI - UPC c Jordi Girona 1-3 08034 Barcelona. Catalonia lluism@ Lluís Padró LSI - UPC c Jordi Girona 1-3 08034 Barcelona. Catalonia padro@ Abstract We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree n-grams automatically learned context constraints linguistically motivated manually written constraints etc. The sources and kinds of constraints are unrestricted and the language model can be easily extended improving the results. The tagger has been tested and evaluated on the WSJ corpus. 1 Introduction In NLP it is necessary to model the language in a representation suitable for the task to be performed. The language models more commonly used are based on two main approaches first the linguistic approach in which the model is written by a linguist generally in the form of rules or constraints Vouti-lainen and Jarvinen 1995 . Second the automatic approach in which the model is automatically obtained from corpora either raw or annotated 1 and consists of n-grams Garside et al. 1987 Cutting et al. 1992 rules Hindle 1989 or neural nets Schmid 1994 . In the automatic approach we can distinguish two main trends The low-level data . trend collects statistics from the training corpora in the form of n-grams probabilities weights etc. The high level data trend acquires more sophisticated information such as context rules constraints or decision trees Daelemans et al. 1996 Marquez and Rodríguez 1995 Samuelsson et al. 1996 . The acquisition methods range from supervised-inductive-learning-from-example algorithms Quinlan 1986 This research has been partially funded by the Spanish Research Department CICYT and inscribed as TIC96-1243-C03-02 When the model is obtained from annotated corpora we talk about supervised
đang nạp các trang xem trước