tailieunhanh - Báo cáo khoa học: "Hierarchical Bayesian Language Modelling for the Linguistically Informed"

In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms of perplexity, and achieves preliminary improvements in English-German translation. | Hierarchical Bayesian Language Modelling for the Linguistically Informed Jan A. Botha Department of Computer Science University of Oxford UK Abstract In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation the model outperforms the Kneser-Ney model in terms of perplexity and achieves preliminary improvements in English-German translation. 1 Introduction The importance of effective language models in machine translation MT and automatic speech recognition ASR is widely recognised. n-gram models in particular ones using Kneser-Ney KN smoothing have become the standard workhorse for these tasks. These models are not ideal for languages that have relatively free word order and or complex morphology. The ability to encode additional linguistic intuitions into models that already have certain attractive properties is an important piece of the puzzle of improving machine translation quality for those languages. But despite their widespread use KN n-gram models are not easily extensible with additional model components that target particular linguistic phenomena. I argue in this paper that the family of hierarchical Pitman-Yor language models HPYLM Teh 2006 Goldwater et al. 2006 are suitable for investigations into more linguistically-informed n-gram language models. Firstly the flexibility to specify arbitrary back-off distributions makes it easy to incorporate multiple models into a larger n-gram model. Secondly the Pitman-Yor process prior Pitman and Yor 1997 generates distributions that are well-suited to a variety of powerlaw behaviours as is often observed in language. Catering for a variety of those is important since the frequency distributions of say suffixes

TỪ KHÓA LIÊN QUAN