scholarly journals Kafka interfaces for composable streaming genomics pipelines

2017 ◽  
Author(s):  
Francesco Versaci ◽  
Luca Pireddu ◽  
Gianluigi Zanetti

AbstractModern sequencing machines produce order of a terabyte of data per day, which need subsequently to go through a complex processing pipeline. The standard workflow begins with a few independent, shared-memory tools, which communicate by means of intermediate files. Given the constant increase of the amount of data produced, this approach is proving more and more unmanageable, due to its lack of robustness and scalability.In this work we propose the adoption of stream computing to simplify the genomic pipeline, boost its performance and improve its fault-tolerance. We decompose the first steps of the genomic processing in two distinct and specialized modules (preprocessing and alignment) and we loosely compose them via communication through Kafka streams, in order to allow for easy composability and integration in the already existing Hadoop-based pipelines. The proposed solution is then experimentally validated on real data and shown to scale almost linearly.

Author(s):  
Rizwan Patan ◽  
Rajasekhara Babu M ◽  
Suresh Kallam

A Big Data Stream Computing (BDSC) Platform handles real-time data from various applications such as risk management, marketing management and business intelligence. Now a days Internet of Things (IoT) deployment is increasing massively in all the areas. These IoTs engender real-time data for analysis. Existing BDSC is inefficient to handle Real-data stream from IoTs because the data stream from IoTs is unstructured and has inconstant velocity. So, it is challenging to handle such real-time data stream. This work proposes a framework that handles real-time data stream through device control techniques to improve the performance. The frame work includes three layers. First layer deals with Big Data platforms that handles real data streams based on area of importance. Second layer is performance layer which deals with performance issues such as low response time, and energy efficiency. The third layer is meant for Applying developed method on existing BDSC platform. The experimental results have been shown a performance improvement 20%-30% for real time data stream from IoT application.


2014 ◽  
Vol 2014 ◽  
pp. 1-9
Author(s):  
Minakshi Tripathy ◽  
C. R. Tripathy

In recent past, many types of shared memory cluster computing interconnection systems have been proposed. Each of these systems has its own advantages and limitations. With the increase in system size of the cluster interconnection systems, the comparative analysis of their various performance measures becomes quite inevitable. The cluster architecture, load balancing, and fault tolerance are some of the important aspects, which need to be addressed. The comparison needs to be made in order to choose the best one for a particular application. In this paper, a detailed comparative study on four important and different classes of shared memory cluster architectures has been made. The systems taken up for the purpose of the study are shared memory clusters, hierarchical shared memory clusters, distributed shared memory clusters, and the virtual distributed shared memory clusters. These clusters are analyzed and compared on the basis of the architecture, load balancing, and fault tolerance aspects. The results of comparison are reported.


Sensors ◽  
2018 ◽  
Vol 18 (3) ◽  
pp. 907 ◽  
Author(s):  
Gustavo Furquim ◽  
Geraldo Filho ◽  
Roozbeh Jalali ◽  
Gustavo Pessin ◽  
Richard Pazzi ◽  
...  

Author(s):  
M. Dal Cin ◽  
A. Grygier ◽  
H. Hessenauer ◽  
U. Hildebrand ◽  
J. Hönig ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document