tailieunhanh - Báo cáo khoa học: "Automatic Syllabification with Structured SVMs for Letter-To-Phoneme Conversion"

We present the first English syllabification system to improve the accuracy of letter-tophoneme conversion. We propose a novel discriminative approach to automatic syllabification based on structured SVMs. In comparison with a state-of-the-art syllabification system, we reduce the syllabification word error rate for English by 33%. Our approach also performs well on other languages, comparing favorably with published results on German and Dutch. | Automatic Syllabification with Structured SVMs for Letter-To-Phoneme Conversion Susan Bartlett Grzegorz Kondrak Colin Cherry Department of Computing Science University of Alberta Edmonton AB T6G 2E8 Canada susan kondrak @ Microsoft Research One Microsoft Way Redmond WA 98052 colinc@ Abstract We present the first English syllabification system to improve the accuracy of letter-to-phoneme conversion. We propose a novel discriminative approach to automatic syllabification based on structured SVMs. In comparison with a state-of-the-art syllabification system we reduce the syllabification word error rate for English by 33 . Our approach also performs well on other languages comparing favorably with published results on German and Dutch. 1 Introduction Pronouncing an unfamiliar word is a task that is often accomplished by breaking the word down into smaller components. Even small children learning to read are taught to pronounce a word by sounding out its parts. Thus it is not surprising that Letter-to-Phoneme L2P systems which convert orthographic forms of words into sequences of phonemes can benefit from subdividing the input word into smaller parts such as syllables or morphemes. Marchand and Damper 2007 report that incorporating oracle syllable boundary information improves the accuracy of their L2P system but they fail to emulate that result with any of their automatic syllabification methods. Demberg et al. 2007 on the other hand find that morphological segmentation boosts L2P performance in German but not in English. To our knowledge no previous English orthographic syllabification system has been able to actually improve performance on the larger L2P problem. In this paper we focus on the task of automatic orthographic syllabification with the explicit goal of improving L2P accuracy. A syllable is a subdivision of a word typically consisting of a vowel called the nucleus and the consonants preceding and following the vowel called the .