tailieunhanh - Báo cáo khoa học: "Outilex, a Linguistic Platform for Text Processing"

We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use large-coverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into XML, in accordance with current standards. Evaluations on efficiency are given. | Outilex a Linguistic Platform for Text Processing Olivier Blanc IGM University of Marne-la-Vallee 5 bd Descartes - Champs Marne 77454 Marne-la-Vallee France oblanc@ Matthieu Constant IGM University of Marne-la-Vallee 5 bd Descartes - Champs Marne 77 454 Marne-la-Vallee france mconstan@ Abstract We present Outilex a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use large-coverage Language Resources. These resources dictionaries grammars annotated texts are formatted into XML in accordance with current standards. Evaluations on efficiency are given. 1 Credits This project has been supported by the French Ministry of Industry and the CNRS. Thanks to Sky and Francesca Sigal for their linguistic expertise. 2 Introduction The Outilex Project Blanc et al. 2006 aims to develop an open-linguistic platform including tools electronic dictionaries and grammars dedicated to text processing. It is the result of the collaboration of ten French partners composed of 4 universities and 6 industrial organizations. The project started in 2002 and will end in 2006. The platform which will be made freely available to research development and industry in April 2007 comprises software components implementing all the fundamental operations of written text processing text segmentation morphosyntactic tagging parsing with grammars and language resource management. All Language Resources are structured in XML formats as well as binary formats more adequate to efficient processing the required format converters are included in the platform. The grammar formalism allows for the combination of statistical approaches with resource-based approaches. Manually constructed lexicons of substantial coverage for French and English originating from the former LADL1 will be distributed with the platform under LGPL-LR2 license. The platform aims to be a generalist base .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.