tailieunhanh - Báo cáo khoa học: "Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities"

PCFGs can be accurate, they suffer from vocabulary coverage problems: treebanks are small and lexicons induced from them are limited. The reason for this treebank-centric view in PCFG learning is 3-fold: the English treebank is fairly large and English morphology is fairly simple, so that in English, the treebank does provide mostly adequate lexical coverage1 ; Lexicons enumerate analyses, but don’t provide probabilities for them; and, most importantly, the treebank and the external lexicon are likely to follow different annotation schemas, reflecting different linguistic perspectives. . | Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon Fuzzy Tag-set Mapping and EM-HMM-based Lexical Probabilities Yoav Goldberg1 Reut Tsarfaty2 Meni Adler1 Michael Elhadad1 Department of Computer Science Ben Gurion University of the Negev yoavg adlerm elhadad @ 2Institute for Logic Language and Computation University of Amsterdam Abstract We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unannotated corpora. We show that this solution greatly enhances the performance of an unlexicalized Hebrew PCFG parser resulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed and in a real-word parsing scenario of parsing unsegmented tokens. 1 Introduction The intuition behind unlexicalized parsers is that the lexicon is mostly separated from the syntax specific lexical items are mostly irrelevant for accurate parsing and can be mediated through the use of POS tags and morphological hints. This same intuition also resonates in highly lexicalized formalism such as CCG while the lexicon categories are very fine grained and syntactic in nature once the lexical category for a lexical item is determined the specific lexical form is not taken into any further consideration. Despite this apparent separation between the lexical and the syntactic levels both are usually estimated solely from a single treebank. Thus while Supported by the Lynn and William Frankel Center for Computer Sciences Ben Gurion University Funded by the Dutch Science Foundation NWO grant number . Post-doctoral fellow Deutsche Telekom labs at Ben Gurion University PCFGs can be accurate they suffer from vocabulary coverage .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.