tailieunhanh - Báo cáo khoa học: "NLP for Indexing and Retrieval of Captioned Photographs"
We present a text-based approach for the automatic indexing and retrieval of digital photographs taken at crime scenes. Our research prototype, SOCIS, goes beyond keyword-based approaches and methods that extract syntactic relations from captions; it relies on advanced Natural Language Processing techniques in order to extract relational facts. These relational facts consist of a "pragmatic relation" and the entities this relation connects (triples of the form: ARG1REL- ARG2). In SOCIS, the triples are used as complex image indexing terms; however, the extraction mechanism is used not only for indexing purposes but also for image retrieval using free text queries. . | NLP for Indexing and Retrieval of Captioned Photographs Katerina Pastra Horacio Saggion Yorick Wilks Department of Computer Science University of Sheffield England - UK Tel 44-114-222-1800 Fax 44-114-222-1810 katerina saggion yorick @ Abstract We present a text-based approach for the automatic indexing and retrieval of digital photographs taken at crime scenes. Our research prototype SOCIS goes beyond keyword-based approaches and methods that extract syntactic relations from captions it relies on advanced Natural Language Processing techniques in order to extract relational facts. These relational facts consist of a pragmatic relation and the entities this relation connects triples of the form ARG1-REL- ARG2 . In SOCIS the triples are used as complex image indexing terms however the extraction mechanism is used not only for indexing purposes but also for image retrieval using free text queries. The retrieval mechanism computes similarity scores between querytriples and indexing-triples making use of a domain-specific ontology. 1 Indexing and Retrieval of Photographs The normal practice in human indexing or cataloguing of photographs is to use a text-based representation of the pictorial record having recourse to a controlled vocabulary or to free-text . On the one hand an index using authoritative sources . thesauri ensures consistency across human indexers but at the same time it renders the indexing task difficult due to the size of the keyword list that is used - not to mention the cum bersome and unintuitive requirement impose to the user to become familiar with using specific wording for the subsequent retrieval of the images. On the other hand the use of free-text association while natural makes the index representation subjective and error prone. Content-based Image Processing methods are used as an alternative to the manual-annotation bottleneck Veltkamp and Tanase 2000 . Content-based indexing and retrieval of images is based on features .
đang nạp các trang xem trước