tailieunhanh - Báo cáo khoa học: "Dynamically Generating a Protein Entity Dictionary Using Online Resources"

With the overwhelming amount of biological knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. | Dynamically Generating a Protein Entity Dictionary Using Online Resources Hongfang Liu Department of Information Systems University of Maryland Baltimore County Baltimore MD 21250 hfliu@ Abstract With the overwhelming amount of biological knowledge stored in free text natural language processing NLP has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper we present a system that automatically constructs a protein entity dictionary which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25 2004 we obtained 4 046 733 terms for 1 640 082 entities. The dictionary can be accessed from the following website http biothesauru s . Contact hfliu@ 1 Introduction With the use of computers in storing the explosive amount of biological information natural language processing NLP approaches have been explored to make the task of managing information recorded in free text more feasible 1 2 . One requirement for NLP is the ability to accurately recognize terms that represent biological entities in free text. Another requirement is the ability to associate these terms with corresponding biological entities . records in biological databases in order to be used by other automated systems for literature mining. Such task is called biological entity tagging. Biological entity tagging is not a trivial task because of several characteristics associated with biological entity names namely synonymy . different terms refer to the same entity ambiguity . one .