tailieunhanh - Báo cáo khoa học: "Robust PCFG-Based Generation using Automatically Acquired LFG Approximations"

We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations (Cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. We evaluate our approach using stringbased evaluation. We currently achieve coverage of , a BLEU score of and string accuracy of on the Penn-II WSJ Section 23 sentences of length ≤20. grammar for generation. Belz (2005) describes a method for building statistical generation models using an automatically created generation treebank for weather forecasts. . | Robust PCFG-Based Generation using Automatically Acquired LFG Approximations Aoife Cahill1 and Josef van Genabith1 2 1 National Centre for Language Technology NCLT School of Computing Dublin City University Dublin 9 Ireland 2 Center for Advanced Studies IBM Dublin Ireland acahill josef @ Abstract We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations Cahill et al. 2004 automatically extracted from treebanks maximising the probability of a tree given an f-structure. We evaluate our approach using stringbased evaluation. We currently achieve coverage of a BLEU score of and string accuracy of on the Penn-II WSJ Section 23 sentences of length 20. 1 Introduction Wide coverage grammars automatically extracted from treebanks are a corner-stone technology in state-of-the-art probabilistic parsing. They achieve robustness and coverage at a fraction of the development cost of hand-crafted grammars. It is surprising to note that to date such grammars do not usually figure in the complementary operation to parsing - natural language surface realisation. Research on statistical natural language surface realisation has taken three broad forms differing in where statistical information is applied in the generation process. Langkilde 2000 for example uses n-gram word statistics to rank alternative output strings from symbolic hand-crafted generators to select paths in parse forest representations. Bangalore and Rambow 2000 use n-gram word sequence statistics in a TAG-based generation model to rank output strings and additional statistical and symbolic resources at intermediate generation stages. Ratnaparkhi 2000 uses maximum entropy models to drive generation with word bigram or dependency representations taking into account unrealised semantic features. Valldal and Oepen 2005 present a discriminative disambiguation model using a hand-crafted HPSG grammar for generation. Belz 2005

TÀI LIỆU LIÊN QUAN