tailieunhanh - Báo cáo khoa học: "Improving Machine Translation of Null Subjects in Italian and Spanish"

Null subjects are non overtly expressed subject pronouns found in pro-drop languages such as Italian and Spanish. In this study we quantify and compare the occurrence of this phenomenon in these two languages. Next, we evaluate null subjects’ translation into French, a “non prodrop” language. We use the Europarl corpus to evaluate two MT systems on their performance regarding null subject translation: Its-2, a rule-based system developed at LATL, and a statistical system built using the Moses toolkit. . | Improving Machine Translation of Null Subjects in Italian and Spanish Lorenza Russo Sharid Loaiciga Asheesh Gulati Language Technology Laboratory LATL Department of Linguistics - University of Geneva 2 rue de Candolle - CH-1211 Geneva 4 - Switzerland @ Abstract Null subjects are non overtly expressed subject pronouns found in pro-drop languages such as Italian and Spanish. In this study we quantify and compare the occurrence of this phenomenon in these two languages. Next we evaluate null subjects translation into French a non prodrop language. We use the Europarl corpus to evaluate two MT systems on their performance regarding null subject translation Its-2 a rule-based system developed at LATL and a statistical system built using the Moses toolkit. Then we add a rule-based preprocessor and a statistical post-editor to the Its-2 translation pipeline. A second evaluation of the improved Its-2 system shows an average increase of in correct pro-drop translations for Italian-French and for Spanish-French. 1 Introduction Romance languages are characterized by some morphological and syntactical similarities. Italian and Spanish the two languages we are interested in here share the null subject parameter also called the pro-drop parameter among other characteristics. The null subject parameter refers to whether the subject of a sentence is overtly expressed or not Haegeman 1994 . In other words due to their rich morphology Italian and Spanish allow non lexically-realized subject pronouns also called null subjects zero pronouns or pro-drop .1 From a monolingual point of view regarding Spanish previous work by Ferrandez and Peral 1Henceforth the terms will be used indiscriminately. 2000 has shown that 46 of verbs in their test corpus had their subjects omitted. Continuation of this work by Rello and Ilisei 2009 has found that in a corpus of 2 606 sentences there were 1 042 sentences without overtly expressed

TỪ KHÓA LIÊN QUAN