tailieunhanh - Báo cáo hóa học: " Research Article On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing | Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2011 Article ID 294010 14 pages doi 2011 294010 Research Article On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing D. Kumar P. Vimal and Rajesh M. Hegde Department of Electrical Engineering Indian Institute of Technology Kanpur 208016 India Correspondence should be addressed to Rajesh M. Hegde rhegde@ Received 25 July 2010 Revised 8 February 2011 Accepted 2 March 2011 Academic Editor Jar Ferr Yang Copyright 2011 D. Kumar et al. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. Multimodal speech processing has been a subject of investigation to increase robustness of unimodal speech processing systems. Hard fusion of acoustic and visual speech is generally used for improving the accuracy of such systems. In this paper we discuss the significance of two soft belief functions developed for multimodal speech processing. These soft belief functions are formulated on the basis of a confusion matrix of probability mass functions obtained jointly from both acoustic and visual speech features. The first soft belief function BHT-SB is formulated for binary hypothesis testing like problems in speech processing. This approach is extended to multiple hypothesis testing MHT like problems to formulate the second belief function MHT-SB . The two soft belief functions namely BHT-SB and MHT-SB are applied to the speaker diarization and audio-visual speech recognition tasks respectively. Experiments on speaker diarization are conducted on meeting speech data collected in a lab environment and also on the AMI meeting database. Audiovisual speech recognition experiments are conducted on the GRID audiovisual corpus. Experimental results are obtained for both multimodal speech processing tasks .

TÀI LIỆU LIÊN QUAN