tailieunhanh - Báo cáo khoa học: "The intersection of Finite State Automata and Definite Clause Grammars"

Bernard Lang defines parsing as ~ calculation of the intersection of a FSA (the input) and a CFG. Viewing the input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems, in which parsing takes a word lattice as input (rather than a word string). Furthermore, certain techniques for robust parsing can be modelled as finite state transducers. In this paper we investigate how we can generalize this approach for unification grammars. In particular we will concentrate on how we might the calculation of the intersection of a FSA and. | The intersection of Finite state Automata and Definite Clause Grammars Gertjan van Noord Vakgroep Alfa-informatica BCN Rijksuniversiteit Groningen Abstract Bernard Lang defines parsing as the calculation of the intersection of a FSA the input and a CFG. Viewing file input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems in which parsing takes a word lattice as input rather than a word string . Furthermore certain techniques for robust parsing can be modelled as finite state transducers. In this paper we investigate how we can generalize this approach for unification grammars. In particular we will concentrate on how we might the calculation of the intersection of a FSA and a DCG. It is shown that existing parsing algorithms can be easily extended for FSA inputs. However we also show that the termination properties change drastically we show that it is undecidable whether the intersection of a FSA and a DCG is empty even if the DCG is off-line parsable . Furthermore we discuss approaches to cope with the problem 1 Introduction In this paper we are concerned with the syntactic analysis phase of a natural language understanding system. Ordinarily the input of such a system is a sequence of words. However following Bernard Lang we argue that it might be fruitful to take the input more generally as a finite state automaton FSA to model cases in which we are uncertain about the actual input. Parsing uncertain input might be necessary in case of ill-formed textual input or in case of speech input. For example if a natural language understanding system is interfaced with a speech recognition component chances are that this component is uncertain about the actual string of words that has been uttered and thus produces a word lattice of the most promising hypotheses rather than a single sequence of words. FSA of course generalizes such word lattices. As another example certain techniques to .