Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Lexicon and grammar in probabilistic tagging of written English"

Nguyệt Hồng 70 6 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

The paper describes the development of software for automatic grammatical ana]ysi$ of u n l ~ ' U i ~ , unedited English text at the Unit for Compm= Research on the Ev~li~h Language (UCREL) at the U n i v e t ~ of Lancaster. The work is ~n'nmtly funded by IBM and carried out in collaboration with colleagues at IBM UK ( W ' ~ ) and IBM Yorktown Heights. The paper will focus on the lexicon component of the word raging system, the UCREL grammar, the datal~zlks of parsed sentences, and the tools that have been. | Lexicon and grammar in probabilistic tagging of written English. Andrew David Beale Unit for Computer Research on the English Language University of Lancaster Bailrigg Lancaster England LAI 4YT enbO25@ulucJancs.vaxl Abstract The paper describes the development of software for automatic grammatical analysis of unrestricted unedited English text at the Unit for Computer Research on the English Language UCREL at the University of Lancaster. The work is currently funded by IBM and carried out in collaboration with colleagues at IBM UK Winchester and IBM Yorktown Heights. The paper will focus on the lexicon component of the word tagging system the UCREL grammar the databanks of parsed sentences and the tools that have been written to support development of these components. This work has applications to speech technology spelling correction and other areas of natural language processing. Currently our goal is to provide a language model using transition statistics to disambiguate alternative parses for a speech recognition device. 1. Text Corpora Historically the use of text corpora to provide empirical data for testing grammatical theories has been regarded as important to varying degrees by philologists and linguists of differing persuasions. The use of corpus citations in grammars and dictionaries pre-dates electronic data processing Brown. 1984 34 . While most of the generative grammarians of the 60s and 70s ignored corpus data the increased power of the new technology nevertheless points the way to new applications of computerized text corpora in dictionary making style checking and speech recognition. Computer corpora present the computational linguist with the diversity and complexity of real language which is more challenging for testing language models than intuitively derived examples. Ultimately grammars must be judged by their ability to contend with the real facts of language and not just basic constructs extrapolated by grammarians. 2. Word Tagging The .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Maximum Expected BLEU Training of Phrase and Lexicon Translation Models"

Báo cáo khoa học: "Fast Online Lexicon Learning for Grounded Language Acquisition"

Báo cáo khoa học: "An Open-License Broad Coverage Lexicon"

Báo cáo khoa học: "Clustering Comparable Corpora For Bilingual Lexicon Extraction"

Báo cáo khoa học: "Bilingual Lexicon Generation Using Non-Aligned Signatures"

Báo cáo khoa học: "Cross Language Dependency Parsing using a Bilingual Lexicon∗"

Báo cáo khoa học: "Sentiment Translation through Lexicon Induction"

Báo cáo khoa học: "Mood Patterns and Affective Lexicon Access in Weblogs"

Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling"

Báo cáo khoa học: "Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis"

crossorigin="anonymous">

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.