tailieunhanh - Báo cáo khoa học: "Enhancing a large scale dictionary with a two-level system"

We present in this paper a morphological analyzer and generator for French that contains a dictionary of 700,000 inflected words called DELAF 1, and a full twolevel system aimed at the analysis of new derivatives. Hence, this tool recognizes and generates both correct inflected forms of French simple words (DELAF lookup procedure) and new derivatives and their inflected forms (two-level analysis). Moreover, a clear distinction is made between dictionary look-up processes and new words analyses in order to clearly identify the analyses that involve heuristic rules. We tested this tool upon a French corpus of 1,300,000 words with significant. | Enhancing a large scale dictionary with a two-level system David Clemenceau Emmanuel Roche LADL Laboratoire d Automatique Documentaire et Unguistique Université Paris 7 2 place Jussieu 75251 Paris cedex 05 France e-mail roche@ 1 Introduction We present in this paper a morphological analyzer and generator for French that contains a dictionary of 700 000 inflected words called DELAF1 and a full two-level system aimed at the analysis of new derivatives. Hence this tool recognizes and generates both correct inflected forms of French simple words DELAF lookup procedure and new derivatives and their inflected forms two-level analysis . Moreover a clear distinction is made between dictionary look-up processes and new words analyses in order to clearly identify the analyses that involve heuristic rules. We tested this tool upon a French corpus of 1 300 000 words with significant results Clemenceau D. 1992 . With regards to efficiency since this tool is compiled into a unique transducer it provides a very fast look-up procedure 1 100 words per second at a low memory cost around Mb in RAM . 2 Description of the analyzer We first built the transducer representing all the entries of DELAF along with theừ inflectionnal code. Each entry defines a partial function as in inculpons - inculper V Plp which corresponds to the first person plural in the present tense of the verb inculper to charge someone . The union of these 700 000 partial functions leads to the transducer DELAF stored in 1Mb with a look-up procedure of 1 100 words per second. The 70 two-level rules that describe the way characters are changed when prefixes or suffixes are added to words are themselves transducers Karttunen et al. 1992 . The two following two-level rules generate the two surface forms co inculper and co-inculper when adding the prefix co- to the verb inculper. i i c o - 0 i i co - - These 70 ữansducers have been merged into the transducer Rules by performing an intersection. .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.