tailieunhanh - Báo cáo khoa học: "Approximation Lasso Methods for Language Modeling"

Lasso is a regularization method for parameter estimation in linear models. It optimizes the model parameters with respect to a loss function subject to model complexities. This paper explores the use of lasso for statistical language modeling for text input. Owing to the very large number of parameters, directly optimizing the penalized lasso loss function is impossible. | Approximation Lasso Methods for Language Modeling Jianfeng Gao Microsoft Research One Microsoft Way Redmond WA 98052 USA jfgao@ Abstract Hisami Suzuki Microsoft Research One Microsoft Way Redmond WA 98052 USA hisamis@ Lasso is a regularization method for parameter estimation in linear models. It optimizes the model parameters with respect to a loss function subject to model complexities. This paper explores the use of lasso for statistical language modeling for text input. Owing to the very large number of parameters directly optimizing the penalized lasso loss function is impossible. Therefore we investigate two approximation methods the boosted lasso BLasso and the forward stagewise linear regression FSLR . Both methods when used with the exponential loss function bear strong resemblance to the boosting algorithm which has been used as a discriminative training method for language modeling. Evaluations on the task of Japanese text input show that BLasso is able to produce the best approximation to the lasso solution and leads to a significant improvement in terms of character error rate over boosting and the traditional maximum likelihood estimation. 1 Introduction Language modeling LM is fundamental to a wide range of applications. Recently it has been shown that a linear model estimated using discriminative training methods such as the boosting and perceptron algorithms outperforms significantly a traditional word trigram model trained using maximum likelihood estimation MLE on several tasks such as speech recognition and Asian language text input Bacchiani et al. 2004 Roark et al. 2004 Gao et al. 2005 Suzuki and Gao 2005 . The success of discriminative training methods is largely due to fact that unlike the traditional approach . MLE that maximizes the function . likelihood of training data that is loosely associated with error rate discriminative training methods aim to directly minimize the error rate on training data even if