tailieunhanh - Báo cáo khoa học: "Segment-based Hidden Markov Models for Information Extraction"

Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, an HMM is used to model text at the document level. This modelling might cause undesired redundancy in extraction in the sense that more than one filler is identified and extracted. We propose to use HMMs to model text at the segment level, in which the extraction process consists of two steps: a segment retrieval step followed by an extraction step. . | Segment-based Hidden Markov Models for Information Extraction Zhenmei Gu Nick Cercone University of Waterloo Waterloo Ontario Canada N2l 3G1 z2gu@ David R. Cheriton School of Computer Science Faculty of Computer Science Dalhousie University Halifax Nova Scotia Canada B3H 1W5 nick@ Abstract Hidden Markov models HMMs are powerful statistical models that have found successful applications in Information Extraction IE . In current approaches to applying HMMs to IE an HMM is used to model text at the document level. This modelling might cause undesired redundancy in extraction in the sense that more than one filler is identified and extracted. We propose to use HMMs to model text at the segment level in which the extraction process consists of two steps a segment retrieval step followed by an extraction step. In order to retrieve extractionrelevant segments from documents we introduce a method to use HMMs to model and retrieve segments. Our experimental results show that the resulting segment HMM IE system not only achieves near zero extraction redundancy but also has better overall extraction performance than traditional document HMM IE systems. 1 Introduction A Hidden Markov Model HMM is a finite state automaton with stochastic state transitions and symbol emissions Rabiner 1989 . The automaton models a random process that can produce a sequence of symbols by starting from some state transferring from one state to another state with a symbol being emitted at each state until a final state is reached. Formally a hidden Markov model HMM is specified by a five-tuple S K n A B where S is a set of states K is the alphabet of observation symbols n is the initial state distribution A is the probability distribution of state transitions and B is the probability distribution of symbol emissions. When the structure of an HMM is determined the complete model parameters can be represented as A A B n . HMMs are particularly useful in modelling sequential data.

TÀI LIỆU LIÊN QUAN