tailieunhanh - Báo cáo khoa học: "Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm"

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. . | Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm Brian Roark Murat Saraclar Michael Collins Mark Johnson AT T Labs - Research MIT CSAIL Brown University roark murat @ mcollins@ Mark_Johnson@ Abstract This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods the perceptron algorithm and a method based on conditional random fields CRFs . The models are encoded as deterministic weighted finite state automata and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However using the feature set output from the perceptron algorithm initialized with their weights CRF training provides an additional reduction in word error rate for a total absolute reduction from the baseline of . 1 Introduction A crucial component of any speech recognizer is the language model LM which assigns scores or probabilities to candidate output strings in a speech recognizer. The language model is used in combination with an acoustic model to give an overall score to candidate word sequences that ranks them in order of probability or plausibility. A dominant approach in speech recognition has been to use a source-channel or noisy-channel model. In this approach language modeling is effectively framed as density estimation the language model s task is to define a distribution over the source - . the possible strings in the language. Markov n-gram models are often used for this task whose parameters are optimized to maximize the likelihood of a large amount of training text. Recognition performance is a direct measure of the effectiveness of a language model an indirect measure which is frequently proposed within