tailieunhanh - Báo cáo khoa học: "COMPUTATIONAL LINGUISTICS IN INDIA"

Introduction Computational linguistics activities in India are being carried out at many institutions. The activities are centred around development of machine translation systems and lexical resources. 2. Machine Translation with some special notation). For example, adjectival participial phrases in the south Indian languages are mapped to relative clauses in Hindi with the ’*’ notation (Bharati, 2000). Similarly, existing words in the target language may be given wider or narrower meaning (Narayana, 1994). Anusaarakas are available for use as email servers (anusaaraka, URL). . | COMPUTATIONAL LINGUISTICS IN INDIA AN OVERVIEW Akshar Bharati Vineet Chaitanya Rajeev Sangal Language Technologies Research Centre Indian Institute of Information Technology Hyderabad sangal vc @ 1. Introduction Computational linguistics activities in India are being carried out at many institutions. The activities are centred around development of machine translation systems and lexical resources. 2. Machine Translation Four major efforts on machine translation in India are presented below. The first one is from one Indian language to another the next three are from English to Hindi. . Anusaaraka Systems among Indian languages In the anusaaraka systems the load between the human reader and the machine is divided as follows language-based analysis of the text is carried out by the machine and knowledge-based analysis or interpretation is left to the reader. The machine uses a dictionary and grammar rules to produce the output. Most importantly it does not use world knowledge to interpret or disambiguate as it is an error prone task and involves guessing or inferring based on knowledge other than the text. Anusaaraka aims for perfect information preservation . We relax the requirement that the output be grammatical. In fact anusaaraka output follows the grammar of the source language where the grammar rules differ and cannot be applied with 100 percent confidence . This requires that the reader undergo a short training to read and understand the output. Among Indian languages which share vocabulary grammar pragmatics etc. the task and the training is easier. For example words in a language are ambiguous but if the two languages are close one is likely to find a one to one correspondence between words such that the meaning is carried across from the source language to target language. For example for 80 percent of the Kannada words in the anusaaraka dictionary of 30 000 root words there is a single equivalend Hindi word which covers the senses of the .