tailieunhanh - Báo cáo khoa học: "Phrase-based Statistical Language Generation using Graphical Models and Active Learning"

Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. . | Phrase-based Statistical Language Generation using Graphical Models and Active Learning Francois Mairesse Milica Gasic Filip JurCiCek Simon Keizer Blaise Thomson Kai Yu and Steve Young Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ UK mg436 fj228 sk561 brmt2 ky219 sjy @ Abstract Most previous work on trainable language generation has focused on two paradigms a using a statistical model to rank a set of generated utterances or b using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator which limits their scalability to new domains. This paper presents Bagel a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally generation performance on sparse datasets is improved significantly by using certainty-based active learning yielding ratings close to the human gold standard with a fraction of the data. 1 Introduction The field of natural language generation NLG is one of the last areas of computational linguistics to embrace statistical methods. Over the past decade statistical NLG has followed two lines of research. The first one pioneered by Langkilde and Knight 1998 introduces statistics in the generation process by training a model which reranks candidate outputs of a handcrafted generator. While their HALOGEN system uses an n-gram language model trained on news articles other systems have used hierarchical syntactic models Bangalore and Rambow 2000 models trained on user ratings of This research was partly funded by the UK EPSRC under grant agreement eP F013930 1 and funded by the EU FP7 Programme under grant agreement 216594 CLASSiC project . utterance quality Walker et al. .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN