tailieunhanh - Báo cáo khoa học: "Fast Full Parsing by Linear-Chain Conditional Random Fields"

This paper presents a chunking-based discriminative approach to full parsing. We convert the task of full parsing into a series of chunking tasks and apply a conditional random field (CRF) model to each level of chunking. The probability of an entire parse tree is computed as the product of the probabilities of individual chunking results. The parsing is performed in a bottom-up manner and the best derivation is efficiently obtained by using a depthfirst search algorithm. Experimental results demonstrate that this simple parsing framework produces a fast and reasonably accurate parser. . | Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka1 Jun ichi Tsujiitt Sophia Ananiadou1 1 School of Computer Science University of Manchester UK National Centre for Text Mining NaCTeM UK Department of Computer Science University of Tokyo Japan @ Abstract This paper presents a chunking-based discriminative approach to full parsing. We convert the task of full parsing into a series of chunking tasks and apply a conditional random field CRF model to each level of chunking. The probability of an entire parse tree is computed as the product of the probabilities of individual chunking results. The parsing is performed in a bottom-up manner and the best derivation is efficiently obtained by using a depth-first search algorithm. Experimental results demonstrate that this simple parsing framework produces a fast and reasonably accurate parser. 1 Introduction Full parsing analyzes the phrase structure of a sentence and provides useful input for many kinds of high-level natural language processing such as summarization Knight and Marcu 2000 pronoun resolution Yang et al. 2006 and information extraction Miyao et al. 2008 . One of the major obstacles that discourage the use of full parsing in large-scale natural language processing applications is its computational cost. For example the MEDLINE corpus a collection of abstracts of biomedical papers consists of 70 million sentences and would require more than two years of processing time if the parser needs one second to process a sentence. Generative models based on lexicalized PCFGs enjoyed great success as the machine learning framework for full parsing Collins 1999 Char-niak 2000 but recently discriminative models attract more attention due to their superior accuracy Charniak and Johnson 2005 Huang 2008 and adaptability to new grammars and languages Buchholz and Marsi 2006 . A traditional approach to discriminative full parsing is to .

TỪ KHÓA LIÊN QUAN