tailieunhanh - Báo cáo khoa học: "Generating Image Descriptions From Computer Vision Detections"

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date. . | Midge Generating Image Descriptions From Computer Vision Detections Margaret Mitchelứ Jesse Dodge Amit Goyaltt Kota Yamaguchi Karl Stratos Xufeng Han Alyssa Mensch Alex Berg Tamara Berg Hal Daume Illtt U. of Aberdeen and Oregon Health and Science University Stony Brook University aberg tlberg xufhan kyamagu @ U. of Maryland hal amit @ H Columbia University stratos@ U. of Washington dodgejesse@ MIT acmensch@ Abstract This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems automatically generating some of the most natural image descriptions to date. 1 Introduction It is becoming a real possibility for intelligent systems to talk about the visual world. New ways of mapping computer vision to generated language have emerged in the past few years with a focus on pairing detections in an image to words Farhadi et al. 2010 Li et al. 2011 Kulkarni et al. 2011 Yang et al. 2011 . The goal in connecting vision to language has varied systems have started producing language that is descriptive and poetic Li et al. 2011 summaries that add content where the computer vision system does not Yang et al. 2011 and captions copied directly from other images that are globally Farhadi et al. 2010 and locally similar Ordonez et al. 2011 . A commonality between all of these approaches is that they aim to produce naturalsounding descriptions from computer vision detections. This commonality is our starting point We aim to design a system capable of producing natural-sounding descriptions from computer vision detections that are .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.