tailieunhanh - Báo cáo khoa học: "Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase ASR error rates"

Many factors are thought to increase the chances of misrecognizing a word in ASR, including low frequency, nearby disfluencies, short duration, and being at the start of a turn. However, few of these factors have been formally examined. This paper analyzes a variety of lexical, prosodic, and disfluency factors to determine which are likely to increase ASR error rates. Findings include the following. | Which words are hard to recognize Prosodic lexical and disfluency factors that increase ASR error rates Sharon Goldwater Dan Jurafsky and Christopher D. Manning Department of Linguistics and Computer Science Stanford University sgwater jurafsky manning @ Abstract Many factors are thought to increase the chances of misrecognizing a word in ASR including low frequency nearby disfluencies short duration and being at the start of a turn. However few of these factors have been formally examined. This paper analyzes a variety of lexical prosodic and disfluency factors to determine which are likely to increase ASR error rates. Findings include the following. 1 For disfluencies effects depend on the type of disfluency errors increase by up to 15 absolute for words near fragments but decrease by up to absolute for words near repetitions. This decrease seems to be due to longer word duration. 2 For prosodic features there are more errors for words with extreme values than words with typical values. 3 Although our results are based on output from a system with speaker adaptation speaker differences are a major factor influencing error rates and the effects of features such as frequency pitch and intensity may vary between speakers. 1 Introduction In order to improve the performance of automatic speech recognition ASR systems on conversational speech it is important to understand the factors that cause problems in recognizing words. Previous work on recognition of spontaneous monologues and dialogues has shown that infrequent words are more likely to be misrecognized Fosler-Lussier and Morgan 1999 Shinozaki and Furui 2001 and that fast speech increases error rates Siegler and Stern 1995 Fosler-Lussier and Morgan 1999 Shinozaki and Furui 2001 . Siegler and Stern 1995 and Shinozaki and Furui 2001 also found higher error rates in very slow speech. Word length in phones has also been found to be a useful predictor of higher error rates Shinozaki and Furui 2001 . In