tailieunhanh - Báo cáo khoa học: "Iterative Viterbi A* Algorithm for K-Best Sequential Decoding"

Sequential modeling has been widely used in a variety of important applications including named entity recognition and shallow parsing. However, as more and more real time large-scale tagging applications arise, decoding speed has become a bottleneck for existing sequential tagging algorithms. In this paper we propose 1-best A*, 1-best iterative A*, k-best A* and k-best iterative Viterbi A* algorithms for sequential decoding. | Iterative Viterbi A Algorithm for K-Best Sequential Decoding Zhiheng Huang Yi Chang Bo Long Jean-Francois Crespol Anlei Dong Sathiya Keerthi and Su-Lin Wu Yahoo Labs 701 First Avenue Sunnyvale CA 94089 USA zhiheng_huang jfcrespo @ yichang bolong anlei selvarak sulin @ Abstract Sequential modeling has been widely used in a variety of important applications including named entity recognition and shallow parsing. However as more and more real time large-scale tagging applications arise decoding speed has become a bottleneck for existing sequential tagging algorithms. In this paper we propose 1-best A 1-best iterative A k-best A and k-best iterative Viterbi A algorithms for sequential decoding. We show the efficiency of these proposed algorithms for five NLP tagging tasks. In particular we show that iterative Viterbi A decoding can be several times or orders of magnitude faster than the state-of-the-art algorithm for tagging tasks with a large number of labels. This algorithm makes real-time large-scale tagging applications with thousands of labels feasible. 1 Introduction Sequence tagging algorithms including HMMs Ra-biner 1989 CRFs Lafferty et al. 2001 and Collins s perceptron Collins 2002 have been widely employed in NLP applications. Sequential decoding which finds the best tag sequences for given inputs is an important part of the sequential tagging framework. Traditionally the Viterbi algorithm Viterbi 1967 is used. This algorithm is quite efficient when the label size of problem modeled is low. Unfortunately due to its O TL2 time complexity where T is the input token size and L is the label size the Viterbi decoding can become prohibitively slow when the label size is large say larger than 200 . It is not uncommon that the problem modeled consists of more than 200 labels. The Viterbi algorithm cannot find the best sequences in tolerable 611 response time. To resolve this Esposito and Radi-cioni 2009 have proposed a Carpediem algorithm .

TỪ KHÓA LIÊN QUAN