tailieunhanh - Báo cáo khoa học: "Unsupervised Language Model Adaptation Incorporating Named Entity Information"

Language model (LM) adaptation is important for both speech and language processing. It is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation, this paper investigates how effectively using named entity (NE) information, instead of considering all the words, helps LM adaptation. We evaluate two latent topic analysis approaches in this paper, namely, clustering and Latent Dirichlet Allocation (LDA). . | Unsupervised Language Model Adaptation Incorporating Named Entity Information Feifan Liu and Yang Liu Department of Computer Science The University of Texas at Dallas Richardson TX USA ffliu yangl @ Abstract Language model LM adaptation is important for both speech and language processing. It is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation this paper investigates how effectively using named entity NE information instead of considering all the words helps LM adaptation. We evaluate two latent topic analysis approaches in this paper namely clustering and Latent Dirichlet Allocation LDA . In addition a new dynamically adapted weighting scheme for topic mixture models is proposed based on LDA topic analysis. Our experimental results show that the NE-driven LM adaptation framework outperforms the baseline generic LM. The best result is obtained using the LDA-based approach by expanding the named entities with syntactically filtered words together with using a large number of topics which yields a perplexity reduction of compared to the baseline generic LM. 1 Introduction Language model LM adaptation plays an important role in speech recognition and many natural language processing tasks such as machine translation and information retrieval. Statistical N-gram LMs have been widely used however they capture only local contextual information. In addition even with the increasing amount of LM training data there is often a mismatch problem because of differences in domain topics or styles. Adaptation of LM therefore is very important in order to better deal with a variety of topics and styles. Many studies have been conducted for LM adaptation. One method is supervised LM adaptation where topic information is typically available and a topic specific LM is interpolated with the generic LM Kneser and Steinbiss 1993 Suzuki and Gao 2005

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.