tailieunhanh - Báo cáo khoa học: "Semantic enrichment of journal articles using chemical named entity recognition"

We describe the semantic enrichment of journal articles with chemical structures and biomedical ontology terms using Oscar, a program for chemical named entity recognition (NER). We describe how Oscar works and how it can been adapted for general NER. We discuss its implementation in a real publishing workflow and possible applications for enriched articles. | Semantic enrichment of journal articles using chemical named entity recognition Colin R. Batchelor Royal Society of Chemistry Thomas Graham House Milton Road Cambridge UK CB4 oWf batchelorc@ Peter T. Corbett Unilever Centre for Molecular Science Informatics University Chemical Laboratory Lensfield Road Cambridge UK CB2 1Ew ptc24@ Abstract We describe the semantic enrichment of journal articles with chemical structures and biomedical ontology terms using Oscar a program for chemical named entity recognition NER . We describe how Oscar works and how it can been adapted for general NER. We discuss its implementation in a real publishing workflow and possible applications for enriched articles. 1 Introduction The volume of chemical literature published has exploded over the past few years. The crossover between chemistry and molecular biology disciplines which often study similar systems with contrasting techniques and describe their results in different languages has also increased. Readers need to be able to navigate the literature more effectively and also to understand unfamiliar terminology and its context. One relatively unexplored method for this is semantic enrichment. Substructure and similarity searching for chemical compounds is a particularly exciting prospect. Enrichment of the bibliographic data in an article with hyperlinked citations is now commonplace. However the actual scientific content has remained largely unenhanced this falling to secondary services and experimental websites such as GoPubMed Delfs et al. 2005 or EBIMed Rebholz-Schuhmann et al. 2007 . There are a few examples of semantic enrichment on small a few dozen articles per year journals such as Nature Chemical Biology being an example but for a larger journal it is impractical to do this entirely by hand. This paper concentrates on implementing semantic enrichment of journal articles as part of a publishing workflow specifically chemical structures and biomedical terms. In