Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "An Integrated Term-Based Corpus Query System"

Tố Loan 104 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

In this paper we describe the X-TRACT workbench, which enables efficient termbased querying against a domain-specific literature corpus. Its main aim is to aid domain specialists in locating and extracting new knowledge from scientific literature corpora. Before querying, a corpus is automatically terminologically analysed by the ATRACT system, which performs terminology recognition based on the C/NCvalue method enhanced by incorporation of term variation handling. The results of terminology processing are annotated in XML, and the produced XML documents are stored in an XML-native database. All corpus retrieval operations are performed against this database using an XML query language. We. | An Integrated Term-Based Corpus Query System Irena Spasic Goran Nenadic Computer Science Dept of Computation University of Salford UMIST I. Spasic@salford.ac.uk G.Nenadic@umist.ac.uk Kostas Manios Computer Science University of Salford K.Manios @salford.ac.uk Sophia Ananiadou Computer Science University of Salford S.Ananiadou@salford.ac.uk Abstract In this paper we describe the X-TRACT workbench which enables efficient termbased querying against a domain-specific literature corpus. Its main aim is to aid domain specialists in locating and extracting new knowledge from scientific literature corpora. Before querying a corpus is automatically terminologically analysed by the ATRACT system which performs terminology recognition based on the C NC-value method enhanced by incorporation of term variation handling. The results of terminology processing are annotated in XML and the produced XML documents are stored in an XML-native database. All corpus retrieval operations are performed against this database using an XML query language. We illustrate the way in which the X-TRACT workbench can be utilised for knowledge discovery literature mining and conceptual information extraction. 1 Introduction New scientific discoveries usually result in an abundance of publications verbalising these findings in an attempt to share new knowledge with other scientists. Electronically available texts are continually being created and updated and thus the knowledge represented in such texts is more up-to-date than in any other media. The sheer amount of published papers1 makes it difficult for a human to efficiently 1 For example the Medline database www.ncbi.nlm.nih.gov PubMed currently contains over 12 million abstracts in the domains of molecular biology biomedicine and medicine growing by more than 40.000 abstracts each month. localise the information of interest not only in a collection of documents but also within a single document. The growing number of electronically available .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Syntactic and Semantic Factors in Processing Difﬁculty: An Integrated Measure"

Báo cáo khoa học: "A Joint Sequence Translation Model with Integrated Reordering"

Báo cáo khoa học: "A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing"

Báo cáo khoa học: "An Integrated Architecture for Generating Parenthetical Constructions"

Báo cáo khoa học: "The Summarization Integrated Development Environment"

Báo cáo khoa học: "An Integrated Multi-document Summarization Approach based on Word Hierarchical Representation"

Báo cáo khoa học: "Forest Rescoring: Faster Decoding with Integrated Language Models ∗"

Báo cáo khoa học: "Integrated Morphological and Syntactic Disambiguation for Modern Hebrew"

Báo cáo khoa học: "Extracting Relations with Integrated Information Using Kernel Methods"

Báo cáo khoa học: "Integrated Shallow and Deep Parsing"