tailieunhanh - Báo cáo khoa học: "Automatic Compensation for Parser Figure-of-Merit Flaws*"

Best-first chart parsing utilises a figure of merit (FOM) to efficiently guide a parse by first attending to those edges judged better. In the past it has usually been static; this paper will show that with some extra information, a parser can compensate for FOM flaws which otherwise slow it down. Our results are faster than the prior best by a factor of ; and the speedup is won with no significant decrease in parser accuracy. length. Incomplete constituents ("edges") are stored in an agenda. The exhaustion of the agenda definitively marks the completion of the parsing algorithm, but. | Automatic Compensation for Parser Figure-of-Merit Flaws Don Blaheta and Eugene Charniak dpb ec @ Department of Computer Science Box 1910 115 Waterman St. 4th floor Brown University Providence RI 02912 Abstract Best-first chart parsing utilises a figure of merit FOM to efficiently guide a parse by first attending to those edges judged better. In the past it has usually been static this paper will show that with some extra information a parser can compensate for FOM flaws which otherwise slow it down. Our results are faster than the prior best by a factor of and the speedup is won with no significant decrease in parser accuracy. 1 Introduction Sentence parsing is a task which is traditionally rather computationally intensive. The best known practical methods are still roughly cubic in the length of the sentence less than ideal when dealing with nontrivial sentences of 30 or 40 words in length as frequently found in the Penn Wall Street Journal treebank corpus. Fortunately there is now a body of literature on methods to reduce parse time so that the exhaustive limit is never reached in practice. 1 For much of the work the chosen vehicle is chart parsing. In this technique the parser begins at the word or tag level and uses the rules of a context-free grammar to build larger and larger constituents. Completed constituents are stored in the cells of a chart according to their location and This research was funded in part by NSF Grant IRI-9319516 and ONR Grant N0014-96-1-0549. 1An exhaustive parse always overgenerates because the grammar contains thousands of extremely rarely applied rules these are correctly rejected even by the simplest parsers eventually but it would be better to avoid them entirely. length. Incomplete constituents edges are stored in an agenda. The exhaustion of the agenda definitively marks the completion of the parsing algorithm but the parse needn t take that long already in the early work on chart parsing Kay 1970 suggests that by