tailieunhanh - Báo cáo khoa học: "Information Extraction From Voicemail"

In this paper we address the problem of extracting key pieces of information from voicemail messages, such as the identity and phone number of the caller. This task differs from the named entity task in that the information we are interested in is a subset of the named entities in the message, and consequently, the need to pick the correct subset makes the problem more difficult. Also, the caller’s identity may include information that is not typically associated with a named entity. | Information Extraction From Voicemail Jing Huang and Geoffrey Zweig and Mukund Padmanabhan IBM T. J. Watson Research Center Yorktown Heights NY 10598 USA jhuang gzweig mukund@ Abstract In this paper we address the problem of extracting key pieces of information from voicemail messages such as the identity and phone number of the caller. This task differs from the named entity task in that the information we are interested in is a subset of the named entities in the message and consequently the need to pick the correct subset makes the problem more difficult. Also the caller s identity may include information that is not typically associated with a named entity. In this work we present three information extraction methods one based on hand-crafted rules one based on maximum entropy tagging and one based on probabilistic transducer induction. We evaluate their performance on both manually transcribed messages and on the output of a speech recognition system. 1 Introduction In recent years the task of automatically extracting information from data has grown in importance as a result of an increase in the number of publicly available archives and a realization of the commercial value of the available data. One aspect of information extraction IE is the retrieval of documents. Another aspect is that of identifying words from a stream of text that belong in predefined categories for instance named entities such as proper names organizations or numerics. Though most of the earlier IE work was done in the context of text sources recently a great deal of work has also focused on extracting information from speech sources. Examples of this are the Spoken Document Retrieval SDR task NIST 1999 named entity NE extraction DARPA 1999 Miller et al. 2000 Kim and Woodland 2000 . The SDR task focused on Broadcast News and the NE task focused on both Broadcast News and telephone conversations. In this paper we focus on a source of conversational speech data voicemail .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.