tailieunhanh - Báo cáo khoa học: "Independence Assumptions Considered Harmful"
Many current approaches to statistical language modeling rely on independence a.~sumptions 1)etween the different explanatory variables. This results in models which are computationally simple, but which only model the main effects of the explanatory variables oil the response variable. This paper presents an argmnent in favor of a statistical approach that also models the interactions between the explanatory variables. The argument rests on empirical evidence from two series of experiments concerning automatic ambiguity resolution. . | Independence Assumptions Considered Harmful Alexander Franz Sony Computer Science Laboratory D21 Laboratory Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo 141 Japan ami Abstract Many current approaches statistical language modeling rely on independence assumptions between the different explanatory variables. This results in models which are computationally simple but which only model the main effects of the explanatory variables on the response variable. This paper presents an argument in favor of a statistical approach that also models the interactions between the explanatory variables. The argument rests on empirical evidence from two series of experiments concerning automatic ambiguity resolution. 1 Introduction In this paper we present an empirical argument in favor of a certain approach to statistical natural language modeling we advocate statistical natural language models that account for the interactions between the explanatory statistical variables rather than relying on independence assumptions. Such models are able perform prediction on the basis of estimated probability distributions that are properly conditioned on the combinations of the individual values of the explanatory variables. After describing one type of statistical model that is particularly well-suited to modeling natural language data called a loglinear model we present empirical evidence from a series of experiments on different ambiguity resolution tasks that show that the performance of the loglinear models outranks the performance of other models described in the literature that assume independence between the explanatory variables. 2 Statistical Language Modeling By statistical language model . we refer to a mathematical object that imitates the properties of some aspects of natural language and in turn makes predictions that are useful from a scientific or engineer ing point of view. Much recent work in this framework has used written and spoken .
đang nạp các trang xem trước