tailieunhanh - Báo cáo khoa học: "COMPACT REPRESENTATIONS BY FINITE-STATE TRANSDUCERS"

Finite-state transducers give efficient representations of many Natural Language phenomena. They allow to account for complex lexicon restrictions encountered, without involving the use of a large set of complex rules difficult to analyze. We here show that these representations can be made very compact, indicate how to perform the corresponding minimization, and point out interesting linguistic side-effects of this operation. | COMPACT REPRESENTATIONS BY FINITE-STATE TRANSDUCERS Mehryar Mohri Institut Gaspard Monge-LADL Université Marne-la-Vallee 2 rue de la Butte verte 93160 Noisy-le Grand FRANCE Internet mohri@ Abstract Finite-state transducers give efficient representations of many Natural Language phenomena. They allow to account for complex lexicon restrictions encountered without involving the use of a large set of complex rules difficult to analyze. We here show that these representations can be made very compact indicate how to perform the corresponding minimization and point out interesting linguistic side-effects of this operation. 1. MOTIVATION Finite-state transducers constitute appropriate representations of Natural Language phenomena. Indeed they have been shown to be sufficient tools to describe morphological and phonetic forms of a language Karttunen et al. 1992 Kay and Kaplan 1994 . Transducers can then be viewed as functions which map lexical representations to the surface forms or inflected forms to their phonetic pronunciations and vice versa. They allow to avoid the use of a great set of complex rules often difficult to check handle or even understand. Finite-state automata and transducers can also be used to represent the syntactic constraints of languages such as English or French Kosken-niemi 1990 Mohri 1993 Pereira 1991 Roche 1993 . The syntactic analysis can then be reduced to performing the intersection of two automata or to the application of a transducer to an automaton. However whereas first results show that the size of the syntactic transducer exceeds several hundreds of thousands of states no upper bound has been proposed for it as the representation of all syntactic entries has not been done yet. Thus one may ask whether such representations could succeed on a large scale. It is therefore crucial to control or to limit the size of these transducers in order to avoid a blow up. Classic minimization algorithms permit to reduce to the minimal the