tailieunhanh - Báo cáo khoa học: "Conditions on Consistency of Probabilistic Tree Adjoining Grammars*"

Much of the power of probabilistic methods in modelling language comes from their ability to compare several derivations for the same string in the language. An important starting point for the study of such cross-derivational properties is the notion of consistency. The probability model defined by a probabilistic grammar is said to be consistent if the probabilities assigned to all the strings in the language sum to one. | Conditions on Consistency of Probabilistic Tree Adjoining Grammars Anoop Sarkar Dept of Computer and Information Science University of Pennsylvania 200 South 33rd Street Philadelphia PA 19104-6389 USA Abstract Much of the power of probabilistic methods in modelling language comes from their ability to compare several derivations for the same string in the language. An important starting point for the study of such cross-derivational properties is the notion of consistency. The probability model defined by a probabilistic grammar is said to be consistent if the probabilities assigned to all the strings in the language sum to one. From the literature on probabilistic context-free grammars CFGs we know precisely the conditions which ensure that consistency is true for a given CFG. This paper derives the conditions under which a given probabilistic Tree Adjoining Grammar TAG can be shown to be consistent. It gives a simple algorithm for checking consistency and gives the formal justification for its correctness. The conditions derived here can be used to ensure that probability models that use TAGs can be checked for deficiency . whether any probability mass is assigned to strings that cannot be generated . 1 Introduction Much of the power of probabilistic methods in modelling language comes from their ability to compare several derivations for the same string in the language. This cross-derivational power arises naturally from comparison of various derivational paths each of which is a product of the probabilities associated with each step in each derivation. A common approach used to assign structure to language is to use a probabilistic grammar where each elementary rule This research was partially supported by NSF grant SBR8920230 and ARO grant DAAH0404-94-G-0426. The author would like to thank Aravind Joshi Jeff Rey-nar Giorgio Satta B. Srinivas Fei Xia and the two anonymous reviewers for their valuable comments. or production is .