tailieunhanh - Báo cáo khoa học: "Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation"

This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability estimation techniques. These combined models help capture long-distance lexical dependencies. °Experiments on the Broadcast News corpus show significant improvement in perplexity ( overall and on target vocabulary). | Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation Radu Florian and David Yarowsky Computer Science Department and Center for Language and Speech Processing Johns Hopkins University Baltimore Maryland 21218 rflorian yarowsky @cs. j Abstract This paper presents a novel method of generating and applying hierarchical dynamic topic-based language models. It proposes and evaluates new cluster generation hierarchical smoothing and adaptive topic-probability estimation techniques. These combined models help capture long-distance lexical dependencies. Experiments on the Broadcast News corpus show significant improvement in perplexity overall and on target vocabulary . 1 Introduction Statistical language models are core components of speech recognizers optical character recognizers and even some machine translation systems Brown et al. 1990 . The most common language modeling paradigm used today is based on n-grams local word sequences. These models make a Markovian assumption on word dependencies usually that word predictions depend on at most m previous words. Therefore they offer the following approximation for the computation of a word sequence probability p wi nLp nL p u il Cm i where denotes the sequence Wi. Wj a common size for m is 3 trigram language models . Even if n-grams were proved to be very powerful and robust in various tasks involving language models they have a certain handicap because of the Markov assumption the dependency is limited to very short local context. Cache language models Kuhn and de Mori 1992 Rosenfeld 1994 try to overcome this limitation by boosting the probability of the words already seen in the history trigger models Lau et al. 1993 even more general try to capture the interrelationships between words. Models based on syntactic structure Chelba and Jelinek 1998 Wright et al. 1993 effectively estimate intra-sentence syntactic word dependencies. The approach we present here is based on the .