tailieunhanh - Báo cáo khoa học: "A Subcategorization Acquisition System for French Verbs"

This paper presents a system capable of automatically acquiring subcategorization frames (SCFs) for French verbs from the analysis of large corpora. We applied the system to a large newspaper corpus (consisting of 10 years of the French newspaper ’Le Monde’) and acquired subcategorization information for 3267 verbs. The system learned 286 SCF types for these verbs. From the analysis of 25 representative verbs, we obtained precision, recall and F-measure. These results are comparable with those reported in recent related work. . | A Subcategorization Acquisition System for French Verbs Cedric Messiant Laboratoire d Informatique de Paris-Nord CNRS UMR 7030 and Universite Paris 13 99 avenue Jean-Baptiste Clement F-93430 Villetaneuse France Abstract This paper presents a system capable of automatically acquiring subcategorization frames SCFs for French verbs from the analysis of large corpora. We applied the system to a large newspaper corpus consisting of 10 years of the French newspaper Le Monde and acquired subcategorization information for 3267 verbs. The system learned 286 SCF types for these verbs. From the analysis of 25 representative verbs we obtained precision recall and F-measure. These results are comparable with those reported in recent related work. 1 Introduction Many Natural Language Processing NLP tasks require comprehensive lexical resources. Handcrafting large lexicons is labour-intensive and error-prone. A growing body of research focuses therefore on automatic acquisition of lexical resources from text corpora. One useful type of lexical information for NLP is the number and type of the arguments of predicates. These are typically expressed in simple syntactic frames called subcategorization frames SCFs . SCFs can be useful for many NLP applications such as parsing John Carroll and Briscoe 1998 or information extraction Surdeanu et al. 2003 . Automatic acquisition of SCFs has therfore been an active research area since the mid-90s Manning 1993 Brent 1993 Briscoe and Carroll 1997 . Comprehensive subcategorization information is currently not available for most languages. French is one of these languages although manually built syntax dictionaries do exist Gross 1975 van den Eynde and Mertens 2006 Sagot et al. 2006 none of them are ideal for computational use and none also provide frequency information important for statistical NLP. We developed ASSCI a system capable of extracting large subcategorization lexicons for French

TÀI LIỆU LIÊN QUAN