tailieunhanh - Báo cáo khoa học: "Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition"
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost, we propose two techniques: the first is the use of feature forests, which enables us to pack feature-equivalent states, and the second is the introduction of a filtering process which significantly reduces the number of candidate states. . | Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition Daisuke Okanoharaf Yusuke Miyaof Yoshimasa Tsuruoka ị Jun ichi Tsujiifị fDepartment of Computer Science University of Tokyo Hongo 7-3-1 Bunkyo-ku Tokyo Japan ị School of Informatics University of Manchester POBox 88 Sackville St MANCHESTER M60 1QD UK SORST Solution Oriented Research for Science and Technology Honcho 4-1-8 Kawaguchi-shi Saitama Japan hillbig yusuke tsuruoka tsujii @ Abstract This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost we propose two techniques the first is the use of feature forests which enables us to pack feature-equivalent states and the second is the introduction of a filtering process which significantly reduces the number of candidate states. This framework allows us to use a rich set of features extracted from the chunk-based representation that can capture informative characteristics of entities. We also introduce a simple trick to transfer information about distant entities by embedding label information into non-entity labels. Experimental results show that our model achieves an F-score of on the JNLPBA 2004 shared task without using any external resources or post-processing techniques. 1 Introduction The rapid increase of information in the biomedical domain has emphasized the need for automated information extraction techniques. In this paper we focus on the Named Entity Recognition NER task which is the first step in tackling more complex tasks such as relation extraction and knowledge mining. Biomedical NER Bio-NER tasks are in general more difficult than ones in the news domain. For example the best F-score in the shared task of Bio-NER in COLING 2004 JNLPBA Kim et al. 2004 was Zhou .
đang nạp các trang xem trước