tailieunhanh - Báo cáo khoa học: "Phrase Chunking using Entropy Guided Transformation Learning"

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better results than Decision Trees and also than TBL with hand-crafted templates. | Phrase Chunking using Entropy Guided Transformation Learning Ruy L. Milidiu Cicero Nogueira dos Santos Julio C. Duarte Departamento de Informatica Departamento de Informatica Centro Tecnologico do Exercito PUC-Rio PUC-Rio Rio de Janeiro Brazil Rio de Janeiro Brazil nogueira@ jduarte@ milidiu@ Abstract Entropy Guided Transformation Learning ETL is a new machine learning strategy that combines the advantages of decision trees DT and Transformation Based Learning TBL . In this work we apply the ETL framework to four phrase chunking tasks Portuguese noun phrase chunking English base noun phrase chunking English text chunking and Hindi text chunking. In all four tasks ETL shows better results than Decision Trees and also than TBL with hand-crafted templates. ETL provides a new training strategy that accelerates transformation learning. For the English text chunking task this corresponds to a factor of five speedup. For Portuguese noun phrase chunking ETL shows the best reported results for the task. For the other three linguistic tasks ETL shows state-of-the-art competitive results and maintains the advantages of using a rule based system. 1 Introduction Phrase Chunking is a Natural Language Processing NLP task that consists in dividing a text into syntactically correlated parts of words. Theses phrases are non-overlapping . a word can only be a member of one chunk Sang and Buchholz 2000 . It provides a key feature that helps on more elaborated NLP tasks such as parsing and information extraction. Since the last decade many high-performance chunking systems were proposed such as SVM-based Kudo and Matsumoto 2001 Wu et al. 2006 Winnow Zhang et al. 2002 voted-perceptrons Carreras and Marquez 2003 Transformation-Based Learning TBL Ramshaw and Marcus 1999 Megyesi 2002 and Hidden Markov Model HMM Molina and Pla 2002 Memory-based Sang 2002 . State-of-the-art systems for English base noun phrase chunking and text chunking are based in .