tailieunhanh - Báo cáo khoa học: "Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French"
This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins’ Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only. | Lexicalization in Crosslinguistic Probabilistic Parsing The Case of French AbhishekArun and Frank Keller School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK keller@ Abstract This paper presents the first probabilistic parsing results for French using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model which is enriched to the level of Collins Model 2 by adding lexical-ization and subcategorization. The lexi-calized sister-head model and a bigram model are also tested to deal with the flatness of the French Treebank. The bigram model achieves the best performance 81 constituency F-score and 84 dependency accuracy. All lexicalized models outperform the unlexicalized baseline consistent with probabilistic parsing results for English but contrary to results for German where lexicalization has only a limited effect on parsing performance. 1 Introduction This paper brings together two strands of research that have recently emerged in the field of probabilistic parsing crosslinguistic parsing and lexicalized parsing. Interest in parsing models for languages other than English has been growing starting with work on Czech Collins et al. 1999 and Chinese Bikel and Chiang 2000 Levy and Manning 2003 . Probabilistic parsing for German has also been explored by a range of authors Dubey and Keller 2003 Schiehlen 2004 . In general these authors have found that existing lexicalized parsing models for English . Collins 1997 do not straightforwardly generalize to new languages this typically manifests itself in a severe reduction in parsing performance compared to the results for English. A second recent strand in parsing research has dealt with the role of lexicalization. The conventional wisdom since Magerman 1995 has been that lexicalization substantially improves performance compared to an unlexicalized baseline model . a probabilistic context-free grammar PCFG . .
đang nạp các trang xem trước