tailieunhanh - Báo cáo khoa học: "A Probabilistic Context-free Grammar for Disambiguation in Morphological Parsing"

One of the major problems one is faced with when decomposing words into their constituent parts is ambiguity: the generation of multiple analyses for one input word, many of which are implausible. In order to deal with ambiguity, the MORphological PArser MORPA is provided with a probabilistic context-free grammar (PCFG), . it combines a "conventional" context-free morphological grammar to filter out ungrammatical segmentations with a probability-based scoring function which determines the likelihood of each successful parse. . | A Probabilistic Context-free Grammar for Disambiguation in Morphological Parsing Josee s. Heemskerk Institute of Language Technology and Artificial Intelligence Tilburg University . Box 90153 5000 LE Tilburg The Netherlands E-mail joseeh@ Abstract One of the major problems one is faced with when decomposing words into their constituent parts is ambiguity the generation of multiple analyses for one input word many of which are implausible. In order to deal with ambiguity the MORphological PArser MORPA is provided with a probabilistic context-free grammar PCFG . it combines a conventional context-free morphological grammar to filter out ungrammatical segmentations with a probability-based scoring function which determines the likelihood of each successful parse. Consequently remaining analyses can be ordered along a scale of plausibility. Test performance data will show that a PCFG yields good results in morphological parsing. MORPA is a fully implemented parser developed for use in a text-to-speech conversion system. 1 Introduction MORPA is a MORphological PArser developed for use in the text-to-speech conversion system for Dutch SPRAAKMAKER van Leeuwen and te Lin-dert 1993 . An important step in text-to-speech conversion is the generation of the correct phonemic representation on the basis of the input text. As is well-known phonemic transcriptions can not be derived This work was carried out at the Phonetics Laboratory at Leiden University and supported by the Speech Technology Foundation which is funded by the Netherlands Stimulation Project for Information Sciences SPIN. directly from orthographic input in Dutch as there is no one-to-one correspondence between graphemes and phonemes. Also stress and the effects of most phonological rules are not reflected in orthography. A text-to-speech system therefore requires an intelligent method to convert the spelled words of the input sentence into a phonemic representation. As far as the pronunciation of .

TỪ KHÓA LIÊN QUAN