tailieunhanh - Báo cáo khoa học: "A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity"

A minimally supervised machine learning framework is described for extracting relations of various complexity. Bootstrapping starts from a small set of n-ary relation instances as “seeds”, in order to automatically learn pattern rules from parsed data, which then can extract new instances of the relation and its projections. We propose a novel rule representation enabling the composition of n-ary relation rules on top of the rules for projections of the relation. | A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity Feiyu Xu Hans Uszkoreit and Hong Li Language Technology Lab DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbruecken feiyu uszkoreit hongli @ Abstract A minimally supervised machine learning framework is described for extracting relations of various complexity. Bootstrapping starts from a small set of n-ary relation instances as seeds in order to automatically learn pattern rules from parsed data which then can extract new instances of the relation and its projections. We propose a novel rule representation enabling the composition of n-ary relation rules on top of the rules for projections of the relation. The compositional approach to rule construction is supported by a bottom-up pattern extraction method. In comparison to other automatic approaches our rules cannot only localize relation arguments but also assign their exact target argument roles. The method is evaluated in two tasks the extraction of Nobel Prize awards and management succession events. Performance for the new Nobel Prize task is strong. For the management succession task the results compare favorably with those of existing pattern acquisition approaches. 1 Introduction Information extraction IE has the task to discover n-tuples of relevant items entities belonging to an n-ary relation in natural language documents. One of the central goals of the ACE program1 is to develop a more systematically grounded approach to IE starting from elementary entities binary rela 1 http ace tions to n-ary relations such as events. Current semi- or unsupervised approaches to automatic pattern acquisition are either limited to a certain linguistic representation . subject-verb-object or only deal with binary relations or cannot assign slot filler roles to the extracted arguments or do not have good selection and filtering methods to handle the large number of tree patterns Riloff 1996 .

TÀI LIỆU LIÊN QUAN