tailieunhanh - Báo cáo khoa học: "Encoding a Parallel Corpus for Automatic Terminology"

We present a status report about an ongoing research project in the field of (semi-)automatic terminology acquisition at the European Academy Bolzano. The main focus will be on encoding a text corpus, which serves as a basis for applying term extraction programq. The CATEx (C_omputer Terminology E~raction) project emerged from the need to support and improve, both qualitatively and quantitatively, the manual acquisition of terminological data. Thus, the main objective of CATEx is the development of a computational framework for (semi-)antomatic terminology acquisition, which consists of four modules: a parallel text corpus, term-extraction programs, a term bank linked. | Proceedings of EACL 99 Encoding a Parallel Corpus for Automatic Terminology Extraction Johann Gamper European Academy Bolzano Bozen Weggensteinstr. 12 A 39100 Bolzano Bozen Italy jgamper Abstract We present a status report about an ongoing research project in the field of semi- automatic terminology acquisition at the European Academy Bolzano. The main focus will be on encoding a text corpus which serves as a basis for applying term extraction programs. 1 Introduction Text corpora are valuable resources in all areas dealing with natural language processing in one form or another. Terminology is one of these fields where researchers explore domain-specific language material to investigate terminological issues. The manual acquisition of terminological data from text material is a very work-intensive and error-prone task. Recent advances in automatic corpus analysis favored a modern form of terminology acquisition 1 a corpus is a collection of language material in machine-readable form and 2 computer programs scan the corpus for terminologically relevant information and generate lists of term candidates which have to be post-edited by humans. The following project CATEx adopts this approach. 2 The CATEx Project Due to the equal status of the Italian and the German language in South Tyrol legal and administrative documents have to be written in both languages. A prerequisite for high quality translations is a consistent and comprehensive bilingual terminology which also forms the basis for an independent German legal language which reflects the Italian legislation. The first systematic effort in this direction was initiated a few years ago at the European Academy Bolzano Bozen with the goal to compile an Italian German legal and administrative terminology for South Tyrol. The CATEx Computer Assisted Terminology Extraction project emerged from the need to support and improve both qualitatively and quantitatively the manual acquisition of terminological data.

TỪ KHÓA LIÊN QUAN