tailieunhanh - Báo cáo khoa học: "Beyond N in N-gram Tagging"

The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data, leading to incorrect estimations of their probabilities. The trigram HMM can be extended with global contextual information, without making the model infeasible, by incorporating the context separately from the POS tags. . | Beyond N in N-gram Tagging Robbert Prins Alfa-Informatica University of Groningen . Box 716 NL-9700 AS Groningen The Netherlands Abstract The Hidden Markov Model HMM for part-of-speech POS tagging is typically based on tag trigrams. As such it models local context but not global context leaving long-distance syntactic relations unrepresented. Using n-gram models for n 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data leading to incorrect estimations of their probabilities. The trigram HMM can be extended with global contextual information without making the model infeasible by incorporating the context separately from the POS tags. The new information incorporated in the model is acquired through the use of a wide-coverage parser. The model is trained and tested on Dutch text from two different sources showing an increase in tagging accuracy compared to tagging using the standard model. 1 Introduction The Hidden Markov Model HMM used for part-of-speech POS tagging is usually a second-order model using tag trigrams implementing the idea that a limited number of preceding tags provide a considerable amount of information on the identity of the current tag. This approach leads to good results. For example the TnT trigram HMM tagger achieves state-of-the-art tagging accuracies on English and German Brants 2000 . In general however as the model does not consider global context mistakes are made that concern long-distance syntactic relations. 2 A restriction of HMM tagging The simplifying assumption which is the basis for HMM tagging that the context of a given tag can be fully represented by just the previous two tags leads to tagging errors where syntactic features that fall outside of this range and that are needed for determining the identity of the tag at hand are ignored. One such error in tagging Dutch is related to finiteness of .

TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.