tailieunhanh - Báo cáo khoa học: "SPEECH OGLE: Indexing Uncertainty for Spoken Document Search"
The paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. In experiments performed on a collection of lecture recordings — MIT iCampus data — the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. | SPEECH OGLE Indexing Uncertainty for Spoken Document Search Ciprian Chelba and Alex Acero Microsoft Research Microsoft Corporation One Microsoft Way Redmond WA 98052 chelba alexac @ Abstract The paper presents the Position Specific Posterior Lattice PSPL a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. In experiments performed on a collection of lecture recordings MIT iCam-pus data the spoken document ranking accuracy was improved by 20 relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. The inverted index built from PSPL lattices is compact about 20 of the size of 3-gram ASR lattices and 3 of the size of the uncompressed speech and it allows for extremely fast retrieval. Furthermore little degradation in performance is observed when pruning PSPL lattices resulting in even smaller indexes 5 of the size of 3-gram ASR lattices. 1 Introduction Ever increasing computing power and connectivity bandwidth together with falling storage costs result in an overwhelming amount of data of various types being produced exchanged and stored. Consequently search has emerged as a key application as more and more data is being saved Church 2003 . Text search in particular is the most active area with applications that range from web and private network search to searching for private information residing on one s hard-drive. Speech search has not received much attention due to the fact that large collections of untranscribed spoken material have not been available mostly due to storage constraints. As storage is becoming cheaper the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns. This leads us to .
đang nạp các trang xem trước