Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Conditional Random Fields for Word Hyphenation"

Hồng Sơn 82 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random ﬁelds. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. . | Conditional Random Fields for Word Hyphenation Nikolaos Trogkanis Computer Science and Engineering University of California San Diego La Jolla California 92093-0404 tronikos@gmail.com Charles Elkan Computer Science and Engineering University of California San Diego La Jolla California 92093-0404 elkan@cs.ucsd.edu Abstract Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource and achieve error rates for English of less than 0.1 for correctly allowed hyphens and less than 0.01 for Dutch. Experiments show that both the Knuth Liang method and a leading current commercial alternative have error rates several times higher for both languages. 1 Introduction The task that we investigate is learning to split words into parts that are conventionally agreed to be individual written units. In many languages it is acceptable to separate these units with hyphens but it is not acceptable to split words arbitrarily. Another way of stating the task is that we want to learn to predict for each letter in a word whether or not it is permissible for the letter to be followed by a hyphen. This means that we tag each letter with either 1 for hyphen allowed following this letter or 0 for hyphen not allowed after this letter. The hyphenation task is also called orthographic syllabification Bartlett et al. 2008 . It is an important issue in real-world text processing as described further in Section 2 below. It is also useful as a preprocessing step to improve letter-to-phoneme conversion and more generally for text-to-speech conversion. In the well-known NETtalk system for example .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Conditional Random Fields for Word Hyphenation"

Báo cáo khoa học: "Jointly optimizing a two-step conditional random ﬁeld model for machine transliteration and its fast decoding algorithm"

Báo cáo khoa học: "Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums"

Báo cáo khoa học: "Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields"

Báo cáo khoa học: "Efﬁcient, Feature-based, Conditional Random Field Parsing"

Báo cáo khoa học: "Discriminative Word Alignment with Conditional Random Fields"

Báo cáo khoa học: "Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling"

Báo cáo khoa học: "Training Conditional Random Fields with Multivariate Evaluation Measures"

Báo cáo khoa học: "Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition"

Báo cáo khoa học: "Conditional Modality Fusion for Coreference Resolution"