tailieunhanh - Multiple sequence alignment on the grid computing using cache technique

In this paper, we consider and improve the global progressing algorithm by using the cache storage technique to make the best of the previous alignment results. This algorithm and cache technique were been developed on the distributed system and Grid computing environment in order to decrease the algorithms execution time, as well as increase the quantity and size of input sequences. | International Journal of Computer Science and Telecommunications [Volume 3, Issue 7, July 2012] 46 Multiple Sequence Alignment on the Grid Computing using Cache Technique ISSN 2047-3338 Le Van Vinh1, Tran Van Lang2, Nguyen Thi Thu Du2 and Vo Hong Bao Chau3 Abstract—Multiple sequence alignment is an important problem and popular in the molecular biology. This is a basic problem that its solution could be used to proof and discover the similarity of the new sequence with other exist sequences; to define the evolution process of the family’s sequences; as well as to support the protein structure prediction, etc. In this paper, we consider and improve the global progressing algorithm by using the cache storage technique to make the best of the previous alignment results. This algorithm and cache technique were been developed on the distributed system and Grid computing environment in order to decrease the algorithms execution time, as well as increase the quantity and size of input sequences. Index Terms—Biological Sequences, DNA, Protein, Grid Computing and Distributed Computing I. INTRODUCTION M ULTIPLE Sequence Alignment (MSA) is a sequence alignment of three or more biology sequences such as DNA, RNA, or protein. The result of the task can be used to infer sequence homology and conduct phylogenetic analysis to assess the sequences shared evolutionary origins [2], [3]. The accuracy and execution time are major factors requiring the attention of researchers. Some popular approaches used for MSA algorithms are exact solution, progressive methods, iterative methods, or methods based on Hidden Markov Models. Each method has its advantages and disadvantages. Biologists are the persons who decided suitable method to process their biological data. The multiple sequence alignment is the problem with exponential complexity. Over the years, researcher efforts in finding different algorithms or mathematical models that require low computational cost as well as ensure .