We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simple rule-based system for generation from English tectogrammatical representation. | Czech-English Dependency-based Machine Translation Martin Cmejrek Jan Curin and Jiri Havelka Institute of Formal and Applied Linguistics and Center for Computational Linguistics Charles University in Prague cmejrek curin havelka @ Abstract We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes morphological tagging analytical and tectogrammat-ical parsing of Czech tectogrammati-cal transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ and a simple rule-based system for generation from English tectogram-matical representation. In the evaluation part we compare results of the fully automated and the manually annotated processes of building the tectogrammat-ical 1 Introduction The experiment described in this paper is an attempt to develop a full MT system based on dependency trees DBMT . Dependency trees represent the sentence structure as concentrated around the verb and its valency. We use tectogrammatical dependency trees capturing the linguistic meaning of the sentence. In a tectogrammatical dependency tree only autosemantic lexical words are represented as nodes dependencies edges are labeled 1This research was supported by the following grants MSMT CR Grant No. LN00A063 and NSF Grant No. IIS-0121285. by tectogrammatical functors denoting the semantic roles the information conveyed by auxiliary words is stored in attributes of the nodes. For details about the tectogrammatical representation see Hajicova et al. 2000 an example of a tectogram-matical tree can be found in Figure 3. MAGENTA Hajic et al. 2002 is an experimental framework for machine translation implemented during 2002 NLP Workshop at CLSP Johns Hopkins University in Baltimore. Modules for parsing of Czech lexical transfer a prototype of a statistical tree-to-tree transducer for .
