Big Data in Bioinformatics

Н.Н. Назипова; N.N. Nazipova

doi:10.17537/2017.12.102

Big Data in Bioinformatics

Математическая биология и биоинформатика ◽

10.17537/2017.12.102 ◽

2017 ◽

Vol 12 (1) ◽

pp. 102-119 ◽

Cited By ~ 3

Author(s):

Н.Н. Назипова ◽

N.N. Nazipova

Keyword(s):

Big Data ◽

High Performance ◽

New Technologies ◽

Collaborative Work ◽

Heterogeneous Data ◽

Visual Programming ◽

Generic Programming ◽

Data Formats ◽

Huge Data ◽

Sequencing Platforms

Sequencing of the human genome began in 1994. It took 10 years of collaborative work of many research groups from different countries in order to provide a draft of the human DNA. Modern technologies allow sequencing of a whole genome in a few days. We discuss here the advances in modern bioinformatics related to the emergence of high-performance sequencing platforms, which not only contributed to the expansion of capabilities of biology and related sciences, but also gave rise to the phenomenon of Big Data in biology. The necessity for development of new technologies and methods for organization of storage, management, analysis and visualization of big data is substantiated. Modern bioinformatics is facing not only the problem of processing enormous volumes of heterogeneous data, but also a variety of methods of interpretation and presentation of the results, the simultaneous existence of various software tools and data formats. The ways of solving the arising challenges are discussed, in particular by using experiences from other areas of modern life, such as web and business intelligence. The former is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology (IT) for new products, services and frameworks that are empowered by the World Wide Web; the latter is the domain of IT, which addresses the issues of decision-making. New database management systems, other than relational ones, will help solve the problem of storing huge data and providing an acceptable timescale for performing search queries. New programming technologies, such as generic programming and visual programming, are designed to solve the problem of the diversity of genomic data formats and to provide the ability to quickly create one's own scripts for data processing.

Download Full-text

Big Data Analytics Framework for Real-Time Genome Analysis: A Comprehensive Approach

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8302 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3419-3427

Author(s):

Shishir K. Shandilya ◽

S. Sountharrajan ◽

Smita Shandilya ◽

E. Suganya

Keyword(s):

Big Data ◽

Real Time ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Big Data Analytics ◽

Good Precision ◽

Entire Genome ◽

Big Data Technologies ◽

Sequencing Platforms

Big Data Technologies are well-accepted in the recent years in bio-medical and genome informatics. They are capable to process gigantic and heterogeneous genome information with good precision and recall. With the quick advancements in computation and storage technologies, the cost of acquiring and processing the genomic data has decreased significantly. The upcoming sequencing platforms will produce vast amount of data, which will imperatively require high-performance systems for on-demand analysis with time-bound efficiency. Recent bio-informatics tools are capable of utilizing the novel features of Hadoop in a more flexible way. In particular, big data technologies such as MapReduce and Hive are able to provide high-speed computational environment for the analysis of petabyte scale datasets. This has attracted the focus of bio-scientists to use the big data applications to automate the entire genome analysis. The proposed framework is designed over MapReduce and Java on extended Hadoop platform to achieve the parallelism of Big Data Analysis. It will assist the bioinformatics community by providing a comprehensive solution for Descriptive, Comparative, Exploratory, Inferential, Predictive and Causal Analysis on Genome data. The proposed framework is user-friendly, fully-customizable, scalable and fit for comprehensive real-time genome analysis from data acquisition till predictive sequence analysis.

Download Full-text

A Survey on Accelerated Mapreduce for Hadoop

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.03.07 ◽

2017 ◽

Vol 10 (3) ◽

pp. 597-602

Author(s):

Jyotindra Tiwari ◽

Dr. Mahesh Pawar ◽

Dr. Anjajana Pandey

Keyword(s):

Big Data ◽

Data Storage ◽

Energy Efficient ◽

High Performance ◽

Map Reduce ◽

Efficient Computation ◽

Apache Hadoop ◽

Huge Data ◽

Performance Techniques ◽

Big Data Storage

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.

Download Full-text

ScaDS Dresden/Leipzig – A competence center for collaborative big data research

it - Information Technology ◽

10.1515/itit-2018-0026 ◽

2018 ◽

Vol 60 (5-6) ◽

pp. 327-333 ◽

Cited By ~ 1

Author(s):

René Jäkel ◽

Eric Peukert ◽

Wolfgang E. Nagel ◽

Erhard Rahm

Keyword(s):

Big Data ◽

Heterogeneous Data ◽

Data Sets ◽

Data Intensive ◽

Innovative Methods ◽

Huge Data ◽

Wide Range ◽

Resource Requirements ◽

Visualization Of Data ◽

Data Intensive Applications

Abstract The efficient and intelligent handling of large, often distributed and heterogeneous data sets increasingly determines the scientific and economic competitiveness in most application areas. Mobile applications, social networks, multimedia collections, sensor networks, data intense scientific experiments, and complex simulations nowadays generate a huge data deluge. Nonetheless, processing and analyzing these data sets with innovative methods open up new opportunities for its exploitation and new insights. Nevertheless, the resulting resource requirements exceed usually the possibilities of state-of-the-art methods for the acquisition, integration, analysis and visualization of data and are summarized under the term big data. ScaDS Dresden/Leipzig, as one Germany-wide competence center for collaborative big data research, bundles efforts to realize data-intensive applications for a wide range of applications in science and industry. In this article, we present the basic concept of the competence center and give insights in some of its research topics.

Download Full-text

The New Technologies in the Pandemic Era

JOURNAL OF BIOENGINEERING AND TECHNOLOGY APPLIED TO HEALTH ◽

10.34178/jbth.v3i2.122 ◽

2020 ◽

Vol 3 (2) ◽

pp. 134-164

Author(s):

Erick Giovani Sperandio Nascimento ◽

Adhvan Novais Furtado ◽

Roberto Badaró ◽

Luciana Knop

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

New Technologies ◽

World Health ◽

Human Beings ◽

Health Technologies ◽

Novel Coronavirus ◽

Performance Computing

The pandemic of the new coronavirus affected people’s lives by an unprecedented scale. Due to the need for isolation and the treatments, drugs, and vaccines, the pandemic amplified the digital health technologies, such as Artificial Intelligence (AI), Big Data Analytics (BDA), Blockchain, Telecommunication Technology (TT) as well as High-Performance Computing (HPC) and other technologies, to historic levels. These technologies are being used to mitigate, facilitate pandemic strategies, and find treatments and vaccines. This paper aims to reach articles about new technologies applied to COVID-19 published in the main database (PubMed/Medline, Elsevier Science Direct, Scopus, Isi Web of Science, Embase, Excerpta Medica, UptoDate, Lilacs, Novel Coronavirus Resource Directory from Elsevier), in the high-impact international scientific Journals (Scimago Journal and Country Rank - SJR - and Journal Citation Reports - JCR), such as The Lancet, Science, Nature, The New England Journal of Medicine, Physiological Reviews, Journal of the American Medical Association, Plos One, Journal of Clinical Investigation, and in the data from Center for Disease Control (CDC), National Institutes of Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID) and World Health Organization (WHO). We prior selected meta-analysis, systematic reviews, article reviews, and original articles in this order. We reviewed 252 articles and used 140 from March to June 2020, using the terms coronavirus, SARS-CoV-2, novel coronavirus, Wuhan coronavirus, severe acute respiratory syndrome, 2019-nCoV, 2019 novel coronavirus, n-CoV-2, covid, n-SARS-2, COVID-19, corona virus, coronaviruses, New Technologies, Artificial Intelligence, Telemedicine, Telecommunication Technologies, AI, Big Data, BDA, TT, High-Performance Computing, Deep Learning, Neural Network, Blockchain, with the tools MeSH (Medical Subject Headings), AND, OR, and the characters [,“,; /., to ensure the best review topics. We concluded that this pandemic lastly consolidates the new technologies era and will change the whole way of the social life of human beings. Also, a big jump in medicine will happen on procedures, protocols, drug designs, attendances, encompassing all health areas, as well as in social and business behaviors.

Download Full-text

A COMPARATIVE ANALYSIS OF CONVENTIONAL HADOOP WITH PROPOSED CLOUD ENABLED HADOOP FRAMEWORK FOR SPATIAL BIG DATA PROCESSING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-5-425-2018 ◽

2018 ◽

Vol IV-5 ◽

pp. 425-430 ◽

Cited By ~ 1

Author(s):

A. K. Tripathi ◽

S. Agrawal ◽

R. D. Gupta

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Computing Environment ◽

Big Data Processing ◽

Huge Data ◽

Spatial Big Data ◽

New Research ◽

Performance Computing ◽

Hadoop Framework

<p><strong>Abstract.</strong> The emergence of new tools and technologies to gather the information generate the problem of processing spatial big data. The solution of this problem requires new research, techniques, innovation and development. Spatial big data is categorized by the five V’s: volume, velocity, veracity, variety and value. Hadoop is a most widely used framework which address these problems. But it requires high performance computing resources to store and process such huge data. The emergence of cloud computing has provided, on demand, elastic, scalable and payment based computing resources to users to develop their own computing environment. The main objective of this paper is to develop a cloud enabled hadoop framework which combines cloud technology and high computing resources with the conventional hadoop framework to support the spatial big data solutions. The paper also compares the conventional hadoop framework and proposed cloud enabled hadoop framework. It is observed that the propose cloud enabled hadoop framework is much efficient to spatial big data processing than the current available solutions.</p>

Download Full-text

CHALLENGES OF USING BIG DATA IN DISTRIBUTED EXASCALE SYSTEMS

Azerbaijan Journal of High Performance Computing ◽

10.32010/26166127.2020.3.2.245.254 ◽

2020 ◽

Vol 3 (2) ◽

pp. 245-254

Author(s):

Firuza Tahmazli-Khaligova ◽

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Performance ◽

Research Work ◽

Computing System ◽

Huge Data ◽

High Performance Computing System ◽

Interactive Nature ◽

Data Volume ◽

Performance Computing

In a traditional High Performance Computing system, it is possible to process a huge data volume. The nature of events in classic High Performance computing is static. In Distributed Exa-scale System has a different nature. The processing Big data in a distributed exascale system evokes a new challenge. The dynamic and interactive character of a distributed exascale system changes processes status and system elements. This paper discusses the challenge that Big data attributes: volume, velocity, variety, how they influence distributed exascale system dynamic and interactive nature. While investigating the effect of the Dynamic and Interactive nature of exascale systems in computing Big data, this research work suggests the Markov chains model. This model suggests the transition matrix, which identifies system status and memory sharing. It lets us analyze the two systems convergence. As a result in both systems are explored by the influence of each other.

Download Full-text

De-Identification of Unstructured Textual Data using Artificial Immune System for Privacy Preserving

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.2016100103 ◽

2016 ◽

Vol 8 (4) ◽

pp. 34-49 ◽

Cited By ~ 1

Author(s):

Amine Rahmani ◽

Abdelmalek Amine ◽

Reda Mohamed Hamou ◽

Mohamed Amine Boudia ◽

Hadj Ahmed Bouarara

Keyword(s):

Big Data ◽

High Efficiency ◽

New Technologies ◽

Artificial Immune Systems ◽

Research Area ◽

Heterogeneous Data ◽

Small Scale ◽

Artificial Immune ◽

Critical Research ◽

Textual Data

The development of new technologies has led the world into a tipping point. One of these technologies is the big data which made the revolution of computer sciences. Big data has come with new challenges. These challenges can be resumed in the aim of creating scalable and efficient services that can treat huge amounts of heterogeneous data in small scale of time while preserving users' privacy. Textual data occupy a wide space in internet. These data could contain information that can lead to identify users. For that, the development of such approaches that can detect and remove any identifiable information has become a critical research area known as de-identification. This paper tackle the problem of privacy in textual data. The authors' proposed approach consists of using artificial immune systems and MapReduce to detect and hide identifiable words with no matter on their variants using the personnel information of the user from his profile. After many experiments, the system shows a high efficiency in term of number of detected words, the way they are hided with, and time of execution.

Download Full-text

The Recent Revolution in High Performance Computing

MRS Bulletin ◽

10.1557/s0883769400034096 ◽

1997 ◽

Vol 22 (10) ◽

pp. 5-6

Author(s):

Horst D. Simon

Keyword(s):

High Performance Computing ◽

High Performance ◽

New Technologies ◽

New Technology ◽

Parallel Architecture ◽

Time Frame ◽

Good News ◽

Computing Industry ◽

Time And Energy ◽

Performance Computing

Recent events in the high-performance computing industry have concerned scientists and the general public regarding a crisis or a lack of leadership in the field. That concern is understandable considering the industry's history from 1993 to 1996. Cray Research, the historic leader in supercomputing technology, was unable to survive financially as an independent company and was acquired by Silicon Graphics. Two ambitious new companies that introduced new technologies in the late 1980s and early 1990s—Thinking Machines and Kendall Square Research—were commercial failures and went out of business. And Intel, which introduced its Paragon supercomputer in 1994, discontinued production only two years later.During the same time frame, scientists who had finished the laborious task of writing scientific codes to run on vector parallel supercomputers learned that those codes would have to be rewritten if they were to run on the next-generation, highly parallel architecture. Scientists who are not yet involved in high-performance computing are understandably hesitant about committing their time and energy to such an apparently unstable enterprise.However, beneath the commercial chaos of the last several years, a technological revolution has been occurring. The good news is that the revolution is over, leading to five to ten years of predictable stability, steady improvements in system performance, and increased productivity for scientific applications. It is time for scientists who were sitting on the fence to jump in and reap the benefits of the new technology.

Download Full-text