tailieunhanh - Báo cáo khoa học: "Tree Representations in Probabilistic Models for Extended Named Entities Detection"

In this paper we deal with Named Entity Recognition (NER) on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks: i) named entities annotated used in this work have a tree structure, thus the task cannot be tackled as a sequence labelling task; ii) the data used are more noisy than data used for previous NER tasks. We approach the task in two steps, involving Conditional Random Fields and Probabilistic Context-Free Grammars, integrated in a single parsing algorithm. We analyse the effect of using several tree representations. Our system outperforms. | Tree Representations in Probabilistic Models for Extended Named Entities Detection Marco Dinarelli LIMSI-CNRS Orsay France marcod@ Sophie Rosset LIMSI-CNRS Orsay France rosset@ Abstract In this paper we deal with Named Entity Recognition NER on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks i named entities annotated used in this work have a tree structure thus the task cannot be tackled as a sequence labelling task ii the data used are more noisy than data used for previous NER tasks. We approach the task in two steps involving Conditional Random Fields and Probabilistic Context-Free Grammars integrated in a single parsing algorithm. We analyse the effect of using several tree representations. Our system outperforms the best system of the evaluation campaign by a significant margin. 1 Introduction Named Entity Recognition is a traditinal task of the Natural Language Processing domain. The task aims at mapping words in a text into semantic classes such like persons organizations or localizations. While at first the NER task was quite simple involving a limited number of classes Gr-ishman and Sundheim 1996 along the years the task complexity increased as more complex class taxonomies were defined Sekine and Nobata 2004 . The interest in the task is related to its use in complex frameworks for semantic content extraction such like Relation Extraction applications Doddington et al. 2004 . This work presents research on a Named Entity Recognition task defined with a new set of named entities. The characteristic of such set is in that named entities have a tree structure. As conce-quence the task cannot be tackled as a sequence labelling approach. Additionally the use of noisy data like transcriptions of French broadcast data makes the task very challenging for traditional NLP solutions. To deal with such problems we adopt a two-steps approach the first being realized with Conditional

TỪ KHÓA LIÊN QUAN