tailieunhanh - Báo cáo khoa học: "Attention Shifting for Parsing Speech ∗"

We present a technique that improves the efficiency of word-lattice parsing as used in speech recognition language modeling. Our technique applies a probabilistic parser iteratively where on each iteration it focuses on a different subset of the wordlattice. The parser’s attention is shifted towards word-lattice subsets for which there are few or no syntactic analyses posited. This attention-shifting technique provides a six-times increase in speed (measured as the number of parser analyses evaluated) while performing equivalently when used as the first-stage of a multi-stage parsing-based language model. . | Attention Shifting for Parsing Speech Keith Hall Mark Johnson Department of Computer Science Department of Cognitive and Linguistic Science Brown University Providence RI 02912 kh@ Brown University Providence RI 02912 Mar Johnson@ Abstract We present a technique that improves the efficiency of word-lattice parsing as used in speech recognition language modeling. Our technique applies a probabilistic parser iteratively where on each iteration it focuses on a different subset of the wordlattice. The parser s attention is shifted towards word-lattice subsets for which there are few or no syntactic analyses posited. This attention-shifting technique provides a six-times increase in speed measured as the number of parser analyses evaluated while performing equivalently when used as the first-stage of a multi-stage parsing-based language model. 1 Introduction Success in language modeling has been dominated by the linear n-gram for the past few decades. A number of syntactic language models have proven to be competitive with the n-gram and better than the most popular n-gram the trigram Roark 2001 Xu et al. 2002 Charniak 2001 Hall and Johnson 2003 . Language modeling for speech could well be the first real problem for which syntactic techniques are useful. Figure 1 An incomplete parse tree with head-word annotations. One reason that we expect syntactic models to perform well is that they are capable of modeling long-distance dependencies that simple n-gram This research was supported in part by NSF grants 9870676 and 0085940. models cannot. For example the model presented by Chelba and Jelinek Chelba and Jelinek 1998 Xu et al. 2002 uses syntactic structure to identify lexical items in the left-context which are then modeled as an n-gram process. The model presented by Charniak Charniak 2001 identifies both syntactic structural and lexical dependencies that aid in language modeling. While there are n-gram models that attempt to extend the left-context