tailieunhanh - Báo cáo khoa học: "Using Non-lexical Features to Identify Effective Indexing Terms for Biomedical Illustrations"

Automatic image annotation is an attractive approach for enabling convenient access to images found in a variety of documents. Since image captions and relevant discussions found in the text can be useful for summarizing the content of images, it is also possible that this text can be used to generate salient indexing terms. Unfortunately, this problem is generally domainspecific because indexing terms that are useful in one domain can be ineffective in others. Thus, we present a supervised machine learning approach to image annotation utilizing non-lexical features1 extracted from image-related text to select useful terms. We apply this approach. | Using Non-lexical Features to Identify Effective Indexing Terms for Biomedical Illustrations Matthew Simpson Dina Demner-Fushman Charles Sneiderman Sameer K. Antani George R. Thoma Lister Hill National Center for Biomedical Communications National Library of Medicine NIH Bethesda MD USA simpsonmatt ddemner csneiderman santani gthoma @ Abstract Automatic image annotation is an attractive approach for enabling convenient access to images found in a variety of documents. Since image captions and relevant discussions found in the text can be useful for summarizing the content of images it is also possible that this text can be used to generate salient indexing terms. Unfortunately this problem is generally domainspecific because indexing terms that are useful in one domain can be ineffective in others. Thus we present a supervised machine learning approach to image annotation utilizing non-lexical features1 extracted from image-related text to select useful terms. We apply this approach to several subdomains of the biomedical sciences and show that we are able to reduce the number of ineffective indexing terms. 1 Introduction Authors of biomedical publications often utilize images and other illustrations to convey information essential to the article and to support and reinforce textual content. These images are useful in support of clinical decisions in rich document summaries and for instructional purposes. The task of delivering these images and the publications in which they are contained to biomedical clinicians and researchers in an accessible way is an information retrieval problem. Current research in the biomedical domain . Antani et al. 2008 Florea et al. 2007 has investigated hybrid approaches to image retrieval combining elements of content-based image retrieval CBIR and annotation-based image retrieval ABIR . ABIR compared to the image- 1 Non-lexical features describe attributes of image-related text but not the text itself . unlike a .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN