tailieunhanh - Báo cáo khoa học: "A Compacting the Penn Tree bank Grammar"

Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored. | Compacting the Penn Treebank Grammar Alexander Krotov and Mark Hepple and Robert Gaizauskas and Yorick Wilks Department of Computer Science Sheffield University 211 Portobello Street Sheffield SI 4DP UK alexk hepple robertg yorick @ Abstract Treebanks such as the Penn Treebank PTB offer a simple approach to obtaining a broad coverage grammar one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more tree-banked text would be required to obtain a complete grammar if one exists at some limit. However we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammar by eliminating redundant rules - rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar which is significantly less than that of the full treebank grammar is shown to approach a limit. However such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed which is argued to be more linguistically motivated. Combined with simple thresholding this method can be used to give a 58 reduction in grammar size without significant change in parsing performance and can produce a 69 reduction with some gain in recall but a loss in precision. 1 Introduction The Penn Treebank PTB Marcus et al. 1994 has been used for a rather simple approach to deriving large grammars automatically one where the grammar rules are simply read off the parse trees in the corpus with each local subtree providing the left and right hand sides of a rule. Charniak Charniak 1996 reports precision and recall figures of around 80 for a parser employing such a grammar. In this paper we .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.