tailieunhanh - Báo cáo khoa học: "LR RECURSIVE TRANSITION NETWORKS FOR EARLEY AND TOMITA PARSING"

Efficient syntactic and semantic parsing for ambiguous context-free languages are generally characterized as complex, specialized, highly formal algorithms. In fact, they are readily constructed from straightforward recursive Iransition networks (RTNs). In this paper, we introduce LR-RTNs, and then computationally motivate a uniform progression from basic LR parsing, to Earley's (chart) parsing, concluding with Tomita's parser. These apparently disparate algorithms are unified into a single implementation, which was used to automatically generate all the figures in this paper. 1. I N T R O D U C T I O N Ambiguous context-free grammars (CFGs) are currently used in. | LR RECURSIVE TRANSITION NETWORKS FOR EARLEY AND TOMITA PARSING Marie Perl in School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 Internet perlin@ ABSTRACT Efficient syntactic and semantic parsing for ambiguous context-free languages are generally characterized as complex specialized highly formal algorithms. In fact they are readily constructed from straightforward recursive transition networks RTNs . In this paper we introduce LR-RTNs and then computationally motivate a uniform progression from basic LR parsing to Earley s chart parsing concluding with Tomita s parser. These apparently disparate algorithms are unified into a single implementation which was used to automatically generate all the figures in this paper. 1. INTRODUCTION Ambiguous context-free grammars CFGs are cuưently used in the syntactic and semantic processing of natural language. For efficient parsing two major computational methods are used. The first is Earley s algorithm Earley 1970 which merges parse trees to reduce the computational dependence on input sentence length from exponential to cubic cost. Numerous variations on Earley s dynamic programming method have developed into a family of chart parsing Winograd 1983 algorithms. The second is Tomita s algorithm Tomita 1986 which generalizes Knuth s Knuth 1965 and DeRemer s DeRemer 1971 computer language LR parsing techniques. Tomita s algorithm augments the LR parsing set of items construction with Earley s ideas. What is not currently appreciated is the continuity between these apparently distinct computational methods. Tomita has proposed Tomita 1985 constructing his algorithm from Earley s parser instead of DeRemer s LR parser. In fact as we shall show Earley s algorithm may be viewed as one form of LR parsing. Incremental constructions of Tomita s algorithm Heering Klint and Rekers 1990 may similarly be viewed as just one point along a continuum of methods. This work was supported in part by grant R29 LM .