tailieunhanh - Báo cáo khoa học: "THE REPRESENTATION OF CONSTITUENT STRUCTURES FOR FINITE-STATE PARSING"

A mixed prefix-postfix notation for representations of the constituent structures of the expressions of natural languages is proposed, which are of limited degree of center embedding if the original expressions are noncenter-embedding. The method of constructing these representations is applicable to expressions with center embedding, and results in representations which seem to reflect the ways in which people actually parse those expressions. Both the representations and their interpretations can be computed from the expressions from left to right by finite-state devices. . | THE REPRESENTATION OF CONSTITUENT STRUCTURES FOR FINITE-STATE PARSING D. Terence Langendoen Yedidyah Langsam Departments Brooklyn of English and Computer Information Science College of the City University of New York Brooklyn New York 11210 . ABSTRACT A mixed prefix-postfix notation for representations of the constituent structures of the expressions of natural languages is proposed which are of limited degree of center embedding if the original expressions are noncenter-embedding. The method of constructing these representations is applicable to expressions with center embedding and results in representations which seem to reflect the ways in which people actually parse those expressions. Both the representations and their interpretations can be computed from the expressions from left to right by finite-state devices. The class of acceptable expressions of a natural language L all manifest no more than a small fixed finite degree n of center embedding. From this observation it follows that the ability of human beings to parse the expressions of L can be modeled by a finite transducer that associates with the acceptable expressions of L representations of the structural descriptions of those expressions. This paper considers some initial steps in the construction of such a model. The first step is to determine a method of representing the class of constituent structures of the expressions of L without center embedding in such a way that the members of that class themselves have no more than a small fixed finite degree of center embedding. Given a grammar that directly generates that class of constituent structures it is not difficult to construct a deterministic flnite-state transducer parser that assigns the appropriate members of that class to the noncenter-embedded expressions of L from left to right. The second step Is to extend the method so that it is capable of representing the class of constituent structures of expressions of L with no more than degree

TỪ KHÓA LIÊN QUAN