tailieunhanh - Báo cáo khoa học: "Named Entity Recognition for Catalan Using Spanish Resources"

This work studies Named Entity Recognition (NER) for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources, either by first training models for Spanish and then translating them into Catalan, or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data, showing competitive results for the obtained NER systems. . | Named Entity Recognition for Catalan Using Spanish Resources Xavier Carreras Lluis Marquez and Lluís Padró TALP Research Center LSI Department Universitat Politècnica de Catalunya Jordi Girona 1-3 E-08034 Barcelona carreras lluism padro @ Abstract This work studies Named Entity Recognition NER for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources either by first training models for Spanish and then translating them into Catalan or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data showing competitive results for the obtained NER systems. 1 Introduction A Named Entity NE is a lexical unit consisting of a sequence of contiguous words which refers to a concrete entity such as a person a location an organization or an artifact. Figure 1 contains an example sentence extracted from the Spanish corpus referred in section 2 and translated into Catalan including several entities. There is a wide consensus about that Named Entity Recognition and Classification NERC are Natural Language Processing tasks which may improve the performance of many applications such as Information Extraction Machine Translation Question Answering Topic Detection and Tracking etc. Thus interest on detecting and classify ing those units in a text has kept on growing during the last years. Named Entity processing consists of two steps which are usually approached sequentially. First NEs are detected in the text and their boundaries delimited Named Entity Recognition NER . Second entities are classified in a predefined set of classes which usually contain labels such as person organization location etc. Named Entity Classification NEC . In this paper we will focus on the first of these stages that is Named Entity boundary detection. Previous

TỪ KHÓA LIÊN QUAN