Scopus Indexed Paper

Paper Details


Title
Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster
Abstract
Sequence alignment in bioinformatics and compu-tational biology has always been a challenging task. With Next Generation Sequencing (NGS) techniques in hand, researchers are now capable of studying biological systems at a level never been possible before. Scientists now have billions of bytes of biological data to work with, trillions of sequences to align. But this comes at a cost of requiring computing machines having a tremendous amount of computational and analytical power. Purchasing this huge amount of hardware and setting up a standalone infrastructure would not only cost an unnecessarily massive amount of money and labor but also would become troublesome to maintain. Moreover, for aligning a huge number of DNA or Protein sequences a scalable multiple sequence alignment (MSA) algorithms is needed with decent accuracy. In such context, this paper presents a novel implementation of Partial Order Alignment (POA) algorithm on a multi-node Hadoop Cluster running on MapReduce framework. The implementation was done in Amazon AWS platform with multiple EC2 instances. It is a map-only implementation with Hadoop Streaming. The result of this implementation shows a drastic reduction in runtime with no accuracy degradation.
Keywords
Cloud computing, Clustering algorithms, Runtime, Proteins, Software algorithms, Distributed databases, Bioinformatics
Authors
Nafis Neehal ; Dewan Ziaul Karim ; Ashraful Islam
Phone
Journal or Conference Name
20th International Conference of Computer and Information Technology, ICCIT 2017
Publish Year
2018
Indexing
Scopus