tailieunhanh - Báo cáo khoa học: "A structure-sharing parser for lexicalized grammars"

In wide-coverage lexicalized grammars many of the elementary structures have substructures in common. This means that in conventional parsing algorithms some of the computation associated with different structures is duplicated. In this paper we describe a precompilation technique for such grammars which allows some of this computation to be shared. In our approach the elementary structures of the grammar are transformed into finite state automata which can be merged and minimised using standard algorithms, and then parsed using an automatonbased parser. . | A structure-sharing parser for lexicalized grammars Roger Evans Information Technology Research Institute University of Brighton Brighton BN2 4GJ UK David Weir Cognitive and Computing Sciences University of Sussex Brighton BN1 9QH UK Abstract In wide-coverage lexicalized grammars many of the elementary structures have substructures in common. This means that in conventional parsing algorithms some of the computation associated with different structures is duplicated. In this paper we describe a precompilation technique for such grammars which allows some of this computation to be shared. In our approach the elementary structures of the grammar are transformed into finite state automata which can be merged and minimised using standard algorithms and then parsed using an automatonbased parser. We present algorithms for constructing automata from elementary structures merging and minimising them and string recognition and parse recovery with the resulting grammar. 1 Introduction It is well-known that fully lexicalised grammar formalisms such as LTAG Joshi and Schabes 1991 are difficult to parse with efficiently. Each word in the parser s input string introduces an elementary tree into the parse table for each of its possible readings and there is often a substantial overlap in structure between these trees. A conventional parsing algorithm Vijay-Shanker and Joshi 1985 views the trees as independent and so is likely to duplicate the processing of this common structure. Parsing could be made more efficient empirically if not formally if the shared structure could be identified and processed only once. Recent work by Evans and Weir 1997 and Chen and Vijay-Shanker 1997 addresses this problem from two different perspectives. Evans and Weir 1997 outline a technique for compiling LTAG grammars into automata which are then merged to introduce some sharing of structure. Chen and Vijay-Shanker 1997 use underspecified .

TÀI LIỆU LIÊN QUAN