tailieunhanh - Báo cáo khoa học: "Japanese Dependency Structure Analysis Based on Maximum Entropy Models"

This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models. Our model is created by learning the weights of some features from a training corpus to predict the dependency between bunsetsus or phrasal units. The dependency accuracy of our system is using the Kyoto University corpus. We discuss the contribution of each feature set and the relationship between the number of training data and the accuracy. | Proceedings of EACL 99 Japanese Dependency structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto Satoshi Sekine Hitoshi Isahara Communications Research Laboratory Ministry of Posts and Telecommunications 588-2 Iwaoka Iwaoka-cho Nishi-ku Kobe Hyogo 651-2401 Japan uch imot oIi sahara New York University 715 Broadway 7th floor New York NY 10003 USA Abstract This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models. Our model is created by learning the weights of some features from a training corpus to predict the dependency between bunsetsus or phrasal units. The dependency accuracy of our system is using the Kyoto University corpus. We discuss the contribution of each feature set and the relationship between the number of training data and the accuracy. 1 Introduction Dependency structure analysis is one of the basic techniques in Japanese sentence analysis. The Japanese dependency structure is usually represented by the relationship between phrasal units called bunsetsu. The analysis has two conceptual steps. In the first step a dependency matrix is prepared. Each element of the matrix represents how likely one bunsetsu is to depend on the other. In the second step an optimal set of dependencies for the entire sentence is found. In this paper we will mainly discuss the first step a model for estimating dependency likelihood. So far there have been two different approaches to estimating the dependency likelihood. One is the rule-based approach in which the rules are created by experts and likelihoods are calculated by some means including semiautomatic corpusbased methods but also by manual assignment of scores for rules. However hand-crafted rules have the following problems. They have a problem with their coverage. Because there are many features to find correct dependencies it is difficult to find them manually. They also have a problem with their consistency .

TỪ KHÓA LIÊN QUAN