Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Universal Grammar and Lexis for Quick Ramp-Up of MT Systems"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

dictated by the amount of language work which can be carded out, given the resources available. The This paper introduces Boas, a semi-automatic rules of the game specifically exclude linguists and knowledge elicitation system that guides a team of MT developers from the acquisition team. Under two people through the process of developing the such conditions, the only sensible course of action static knowledge sources for a moderate-quality, is to attempt to collect as much knowledge about as broad-coverage MT system from any "low-denmany languages as possible in advance and include sity" language into English in about six months | Universal Grammar and Lexis for Quick Ramp-Up of MT Systems Sergei Nirenburg and Victor Raskin Computing Research Laboratory New Mexico State University Las Cruces N.M. 88003 U.S.A. sergei raskin @crl.nmsu.edu Abstract This paper introduces Boas a semi-automatic knowledge elicitation system that guides a team of two people through the process of developing the static knowledge sources for a moderate-quality broad-coverage MT system from any low-density language into English in about six months. The paper focuses on some issues in the elicitation of descriptive knowledge in Boas and also the issue of the principled reuse of pre-existing resources such as a lexicon an ontology and an English generation module among others made possible by the fact that the client MT system is developed for a single target language. 1. Introduction The Boas Project This paper presents Boas a semi-automatic knowledge elicitation system that guides a team of two people through the process of developing static knowledge sources for a moderate-quality broadcoverage MT system from any low-density 1 language into English in about six months. Boas contains knowledge about human language and means of realization of its phenomena in a number of specific languages and is thus a kind of a linguist in the box that helps non-professional acquirers with the task whose complexity is legendary.2 The knowledge about language elicited by Boas from the acquirers aims to support MT output quality which is roughly commensurate with the outputs of the better commercial systems such as Systran. These relatively modest expectations are 1 Density refers roughly to the amount of effort having been previously expended in the field on computational descriptions of particular languages resulting in the creation of a variety of machine-tractable resources text corpora grammars lexicons analyzers etc. Thus Spanish will most probably count a high-density while say Tagalog will not. dictated by the amount of language