tailieunhanh - Báo cáo khoa học: "A Multi-Neuro Tagger Using Variable Lengths of Contexts"

This paper presents a multi-neuro tagger that uses variable lengths of contexts and weighted inputs (with information gains) for part of speech tagging. Computer experiments show that it has a correct rate of over 94% for tagging ambiguous words when a small Thai corpus with 22,311 ambiguous words is used for training. This result is better than any of the results obtained using the single-neuro taggers with fixed but different lengths of contexts, which indicates that the multi-neuro tagger can dynamically find a suitable length of contexts in tagging. . | A Multi-Neuro Tagger Using Variable Lengths of Contexts Qing Ma and Hitoshi Isahara Communications Research Laboratory Ministry of Posts and Telecommunications 588-2 Iwaoka Nishi-ku Kobe 651-2401 Japan qma isahara @ Abstract This paper presents a multi-neuro tagger that uses variable lengths of contexts and weighted inputs with information gains for part of speech tagging. Computer experiments show that it has a correct rate of over 94 for tagging ambiguous words when a small Thai corpus with 22 311 ambiguous words is used for training. This result is better than any of the results obtained using the single-neuro taggers with fixed but different lengths of contexts which indicates that the multi-neuro tagger can dynamically find a suitable length of contexts in tagging. 1 Introduction Words are often ambiguous in terms of their part of speech POS . POS tagging disambiguates them . it assigns to each word the correct POS in the context of the sentence. Several kinds of POS taggers using rule-based . Brill et al. 1990 statistical . Meri-aldo 1994 memory-based . Daelemans 1996 and neural network . Schmid 1994 models have been proposed for some languages. The correct rate of tagging of these models has reached 95 in part by using a very large amount of training data . 1 000 000 words in Schmid 1994 . For many other languages . Thai which we deal with in this paper however the corpora have not been prepared and there is not a large amount of training data available. It is therefore important to construct a practical tagger using as few training data as possible. In most of the statistical and neural network models proposed so far the length of the contexts used for tagging is fixed and has to be selected empirically. In addition all words in the input are regarded to have the same relevance in tagging. An ideal model would be one in which the length of the contexts can be automatically selected as needed in tagging and the words used in .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.