tailieunhanh - Báo cáo khoa học: "PROJECT APRIL -- A PROGRESS REPORT"

Parsing techniques based on rules defining grammaticality are difficult to use with authentic inputs, which are often grammatically messy. Instead, the APRIL system seeks a labelled tree su~cture which maximizes a numerical measure of conformity to statistical norms derived flom a sample of parsed text. No distinction between legal and illegal trees arises: any labelled tree has a value. | PROJECT APRIL A PROGRESS REPORT Robin Haigh Geoffrey Sampson. Eric Atwell Centre for Computer Analysis of Language and Speech University of Leeds Leeds LS2 9JT UK ABSTRACT Parsing techniques based on rules defining grammaticality are difficult to use with authentic inputs which are often grammatically messy. Instead the APRIL system seeks a labelled tree structure which maximizes a numerical measure of conformity to statistical norms derived from a sample of parsed text No distinction between legal and illegal trees arises any labelled tree has a value. Because the search space is large and has an irregular geometry APRIL seeks the best tree using simulated annealing a stochastic optimization technique. Beginning with an arbitrary tree many randomly-generated local modifications are considered and adopted or rejected according to their effect on tree-value acceptance decisions are made probabilistically subject to a bias against adverse moves which is very weak at the outset but is made to increase as the random walk through the search space continues. This enables the system to converge on the global optimum without getting trapped in local optima. Performance of an early version of the APRIL system on authentic inputs is yielding analyses with a mean accuracy of using a schedule which increases processing linearly with sentence-length modifications currently being implemented should eliminate a high proportion of the remaining errors. INTRODUCTION Project APRIL Annealing Parser for Realistic Input Language is constructing a software system that uses the stochastic optimization technique known as simulated annealing Kirkpatrick et al. 1983 van Laarhoven Aarts 1987 to parse authentic English inputs by seeking labelled tree-structures that maximize a measure of plausibility defined in terms of empirical statistics on parse-tree configurations drawn from a database of manually parsed English text This approach is a response to the fact that real-life English .

TỪ KHÓA LIÊN QUAN