tailieunhanh - Báo cáo khoa học: "GRAMMAR SPECIALIZATION THROUGH ENTROPY THRESHOLDS"

Explanation-based generalization is used to extract a specialized grammar from the original one using a training corpus of parse trees. This allows very much faster parsing and gives a lower error rate, at the price of a small loss in coverage. Previously, it has been necessary to specify the tree-cutting criteria (or operationality criteria) manually; here they are derived automatically from the training set and the desired coverage of the specialized grammar. This is done by assigning an entropy value to each node in the parse trees and cutting in the nodes with sufficiently high entropy values. . | GRAMMAR SPECIALIZATION THROUGH ENTROPY THRESHOLDS Christer Samuelsson Swedish Institute of Computer Science Box 1263 S-164 28 Kista Sweden Internet Abstract Explanation-based generalization is used to extract a specialized grammar from the original one using a training corpus of parse trees. This allows very much faster parsing and gives a lower error rate at the price of a small loss in coverage. Previously it has been necessary to specify the tree-cutting criteria or operationality criteria manually here they are derived automatically from the training set and the desired coverage of the specialized grammar. This is done by assigning an entropy value to each node in the parse trees and cutting in the nodes with sufficiently high entropy values. BACKGROUND Previous work by Manny Rayner and the author see Samuelsson Rayner 1991 attempts to tailor an existing natural-language system to a specific application domain by extracting a specialized grammar from the original one using a large set of training examples. The training set is a treebank consisting of implicit parse trees that each specify a verified analysis of an input sentence. The parse trees are implicit in the sense that each node in the tree is the mnemonic name of the grammar rule resolved on at that point rather than the syntactic category of the LHS of the grammar rule as is the case in an ordinary parse tree. Figure 1 shows five examples of implicit parse trees. The analyses are verified in the sense that each analysis has been judged to be the preferred one for that input sentence by a human evaluator using a semi-automatic evaluation method. A new grammar is created by cutting up each implicit parse tree in the treebank at appropriate points creating a set of new rules that consist of chunks of original grammar rules. The LHS of each new rule will be the LHS phrase of the original grammar rule at the root of the tree chunk and the RHS will be the RHS phrases of the rules in the .