tailieunhanh - Báo cáo khoa học: "A new Approach to Improving Multilingual Summarization using a Genetic Algorithm"

Automated summarization methods can be defined as “language-independent,” if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as “processing several languages, with summary in the same language as input.” In this paper, we introduce MUSE, a languageindependent approach for extractive summarization based on the linear optimization of several sentence ranking measures using a genetic algorithm. | A new Approach to Improving Multilingual Summarization using a Genetic Algorithm Marina Litvak Ben-Gurion University of the Negev Beer Sheva Israel litvakm@ Mark Last Ben-Gurion University of the Negev Beer Sheva Israel mlast@ Menahem Friedman Ben-Gurion University of the Negev Beer Sheva Israel fmenahem@ Abstract Automated summarization methods can be defined as language-independent if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani 2001 as processing several languages with summary in the same language as input. In this paper we introduce MUSE a languageindependent approach for extractive summarization based on the linear optimization of several sentence ranking measures using a genetic algorithm. We tested our methodology on two languages English and Hebrew and evaluated its performance with ROUGE-1 Recall vs. state-of-the-art extractive summarization approaches. Our results show that MUSE performs better than the best known multilingual approach TextRank1 in both languages. Moreover our experimental results on a bilingual English and Hebrew document collection suggest that MUSE does not need to be retrained on each language and the same model can be used across at least two different languages. 1 Introduction Document summaries should use a minimum number of words to express a document s main ideas. As such high quality summaries can significantly reduce the information overload many professionals in a variety of fields must contend 1We evaluated several summarizers SUMMA MEAD Microsoft Word Autosummarize and TextRank on the DUC 2002 corpus. Our results show that TextRank performed best. In addition TextRank can be considered languageindependent as long as it does not perform any morphological analysis. with on a daily basis Filippova et al. 2009 assist in the automated classification and filtering of documents and increase search engines precision. Automated

TỪ KHÓA LIÊN QUAN