tailieunhanh - Báo cáo khoa học: "The Surprising Variance in Shortest-Derivation Parsing"

We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. | The Surprising Variance in Shortest-Derivation Parsing Mohit Bansal and Dan Klein Computer Science Division University of California Berkeley mbansal klein @ Abstract We investigate full-scale shortest-derivation parsing SDP wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme in the fully unpruned case it is neither fast nor accurate. At the other extreme when pruned with a coarse unlexical-ized PCFG the shortest derivation criterion becomes both fast and surprisingly effective rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best our parser achieves an accuracy of 87 F1 on the English WSJ task with minimal annotation and 90 F1 with richer annotation. 1 Introduction One guiding intuition in parsing and data-driven NLP more generally is that all else equal it is advantageous to memorize large fragments of training examples. Taken to the extreme this intuition suggests shortest derivation parsing SDP wherein a test sentence is analyzed in a way which uses as few training fragments as possible Bod 2000 Goodman 2003 . SDP certainly has appealing properties it is simple and parameter free - there need not even be an explicit lexicon. However SDP may be too simple to be competitive. In this paper we consider SDP in both its pure form and with several direct modifications finding a range of behaviors. In its pure form with no pruning or approximation SDP is neither fast nor accurate achieving less than 70 F1 on the English WSJ 720 task. Moreover basic tie-breaking variants and lexical augmentation are insufficient to achieve competitive On the other hand SDP is dramatically improved in both speed and accuracy when a simple unlexicalized PCFG is used for coarse-to-fine pruning and tie-breaking . On the English WSJ the coarse PCFG

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
31    240    0    24-04-2024
34    212    1    24-04-2024
46    187    0    24-04-2024
10    116    0    24-04-2024
40    97    0    24-04-2024
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.