tailieunhanh - Báo cáo khoa học: "Discriminative Syntactic Language Modeling for Speech Recognition"

We describe a method for discriminative training of a language model that makes use of syntactic features. We follow a reranking approach, where a baseline recogniser is used to produce 1000-best output for each acoustic input, and a second “reranking” model is then used to choose an utterance from these 1000-best lists. The reranking model makes use of syntactic features together with a parameter estimation method that is based on the perceptron algorithm. We describe experiments on the Switchboard speech recognition task. . | Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins MIT CS AIL mcollins@ Brian Roark OGI OHSU roark@ Abstract We describe a method for discriminative training of a language model that makes use of syntactic features. We follow a reranking approach where a baseline recogniser is used to produce 1000-best output for each acoustic input and a second reranking model is then used to choose an utterance from these 1000-best lists. The reranking model makes use of syntactic features together with a parameter estimation method that is based on the perceptron algorithm. We describe experiments on the Switchboard speech recognition task. The syntactic features provide an additional reduction in test-set error rate beyond the model of Roark et al. 2004a Roark et al. 2004b significant at p which makes use of a discriminatively trained n-gram model giving a total reduction of over the baseline Switchboard system. 1 Introduction The predominant approach within language modeling for speech recognition has been to use an ngram language model within the source-channel or noisy-channel paradigm. The language model assigns a probability Pl w to each string w in the language the acoustic model assigns a conditional probability Pa a w to each pair a w where a is a sequence of acoustic vectors and w is a string. For a given acoustic input a the highest scoring string under the model is w arg max fi log Pl w log Pa a w 1 w where fi 0 is some value that reflects the relative importance of the language model fi is typically chosen by optimization on held-out data. In Murat Saraclar Bogazici University . tr an n-gram language model a Markov assumption is made namely that each word depends only on the previous n 1 words. The parameters of the language model are usually estimated from a large quantity of text data. See Chen and Goodman 1998 for an overview of estimation techniques for n-gram models. .