tailieunhanh - Integrating image features with convolutional sequence to sequence network for multilingual visual question answering

Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease, but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese, and Japanese. | Journal of Computer Science and Cybernetics 2024 1- DOI no 1813-9663 18155 INTEGRATING IMAGE FEATURES WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE NETWORK FOR MULTILINGUAL VISUAL QUESTION ANSWERING TRIET M. THAI SON T. LUU University of Information Technology Ho Chi Minh City Viet Nam Vietnam National University Ho Chi Minh City Viet Nam Abstract. Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA in which the questions and answers are written in three different languages English Vietnamese and Japanese. We approached the challenge as a sequence-to-sequence learning task in which we integrated hints from pre-trained state-of-the-art VQA models and image features with a convolutional sequence-to-sequence network to generate the desired answers. Our results obtained up to by F1 score on the public test set and on the private test set. Keywords. Visual question answering Sequence-to-sequence learning Multilingual Multimodal. Abbreviations QA Question answering VQA Visual question answering VLSP Association for Vietnamese language and speech processing Seq2Seq Sequence-to-sequence ViT Vision transformer SOTA State-of-the-art GRU Gated recurrent unit GLU Gate linear unit LSTM Long short-term memory RNN Recurrent neural network API Application programming interface ConvS2S Convolutional sequence-to-sequence network Bi-RNN Bi-directional recurrent neural networks ConvS2S Convolutional sequence-to-sequence network BERT Bidirectional encoder representations from transformers Corresponding author. E-mail addresses 19522397@ sonlt@ . Luu . 2024 Vietnam Academy of Science amp Technology 2 TRIET M. THAI SON T. LUU 1. .

Kim Ngân 15 18 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

OHYEAH at VLSP2022-EVJVQA challenge: A jointly language image model for multilingual visual question answering

11 10 1

Integrating image features with convolutional sequence to sequence network for multilingual visual question answering

18 8 1

ViCAN: Co-attention network for Vietnamese visual question answering

7 36 1

Xây dựng bộ dữ liệu tiếng Việt cho bài toán trả lời câu hỏi trực quan (visual question answering)

10 95 3

Visual and Performing Arts Content Standards for California Public Schools

172 64 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462335 61

Giới thiệu :Lập trình mã nguồn mở

14 25874 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11333 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10541 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9832 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8884 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8497 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7690 1786

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7237 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 275 4 22-12-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 227 3 22-12-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1070 2 22-12-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 143 1 22-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 170 1 22-12-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 139 1 22-12-2024

Determini prounoun 1

6 138 0 22-12-2024

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 133 0 22-12-2024

Neuromuscular Diseases A Practical Guideline - part 4

46 148 1 22-12-2024

Tóc highlight cho mùa thu

7 125 0 22-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7690 1786

Ebook Chào con ba mẹ đã sẵn sàng

112 4404 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6267 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8884 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3833 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4695 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11333 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4497 490