Scopus Indexed Publications

Paper Details


Title
Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster
Author
Nafis Neehal, Ashraful B. M. Alim Al Islam, Dewan Ziaul Karim,
Email
nafis.cse@diu.edu.bd
Abstract
Sequence alignment in bioinformatics and compu-tational biology has always been a challenging task. With Next Generation Sequencing (NGS) techniques in hand, researchers are now capable of studying biological systems at a level never been possible before. Scientists now have billions of bytes of biological data to work with, trillions of sequences to align. But this comes at a cost of requiring computing machines having a tremendous amount of computational and analytical power. Purchasing this huge amount of hardware and setting up a standalone infrastructure would not only cost an unnecessarily massive amount of money and labor but also would become troublesome to maintain. Moreover, for aligning a huge number of DNA or Protein sequences a scalable multiple sequence alignment (MSA) algorithms is needed with decent accuracy. In such context, this paper presents a novel implementation of Partial Order Alignment (POA) algorithm on a multi-node Hadoop Cluster running on MapReduce framework. The implementation was done in Amazon AWS platform with multiple EC2 instances. It is a map-only implementation with Hadoop Streaming. The result of this implementation shows a drastic reduction in runtime with no accuracy degradation.

Keywords
NGS , POA , PO-MSA , Hadoop , Mapreduce , Hadoop Streaming
Journal or Conference Name
20th International Conference of Computer and Information Technology, ICCIT 2017
Publication Year
2018
Indexing
scopus