Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/6120820 ◽

2017 ◽

Vol 2017 ◽

pp. 1-16 ◽

Cited By ~ 5

Author(s):

Dillon Chrimes ◽

Hamid Zamani

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Patient Data ◽

Clinical Event ◽

Distributed Data ◽

Patient Records ◽

Clinical Services ◽

Hospital System ◽

Study Objective

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.

Download Full-text

Operational Efficiencies and Simulated Performance of Big Data Analytics Platform over Billions of Patient Records of a Hospital System

Advances in Science Technology and Engineering Systems Journal ◽

10.25046/aj020104 ◽

2017 ◽

Vol 2 (1) ◽

pp. 23-41 ◽

Cited By ~ 1

Author(s):

Dillon Chrimes ◽

Belaid Moa ◽

Mu-Hsing (Alex) Kuo ◽

Andre Kushniruk

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Patient Records ◽

Hospital System ◽

Simulated Performance

Download Full-text

Leveraging Distributed Data Over Big Data Analytics Platform for Healthcare Services

2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) ◽

10.1109/icoei.2018.8553827 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ramesh Mande ◽

G. JayaLakshmi ◽

Kalyan Chakravarti Yelavarti

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Healthcare Services ◽

Distributed Data

Download Full-text

Impact of Big Data Analytics on People’s Health: Overview of Systematic Reviews and Recommendations for Future Studies (Preprint)

10.2196/preprints.27275 ◽

2021 ◽

Author(s):

Israel Júnior Borges do Nascimento ◽

Milena Soriano Marcolino ◽

Hebatullah Mohamed Abdulazeem ◽

Ishanka Weerasekara ◽

Natasha Azzopardi-Muscat ◽

...

Keyword(s):

Big Data ◽

Chronic Diseases ◽

Systematic Reviews ◽

Data Analytics ◽

Big Data Analytics ◽

Health Indicators ◽

High Accuracy ◽

Patient Data ◽

Suicide Mortality ◽

Cochrane Library

BACKGROUND Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health. OBJECTIVE The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health. METHODS Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist. RESULTS The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. “Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data. CONCLUSIONS Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes. CLINICALTRIAL International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048

Download Full-text

Interactive Big Data Analytics Platform for Healthcare and Clinical Services

Global Journal of Engineering Sciences ◽

10.33552/gjes.2018.01.000502 ◽

2018 ◽

Vol 1 (1) ◽

Author(s):

Dillon Chrimes

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Clinical Services

Download Full-text

4. Big data analytics

Big Data: A Very Short Introduction ◽

10.1093/actrade/9780198779575.003.0004 ◽

2017 ◽

pp. 44-58

Author(s):

Dawn E. Holmes

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Processing System ◽

Distributed Data ◽

New Paradigm ◽

Distributed Data Processing ◽

Customer Preferences ◽

Classical Statistics ◽

Core Functionality

‘Big data analytics’ argues that big data is only useful if we can extract useful information from it. It looks at some of the techniques used to discover useful information from big data, such as customer preferences or how fast an epidemic is spreading. Big data analytics is changing rapidly as the size of the datasets increases and classical statistics makes room for this new paradigm. An example of big data analytics is the algorithmic method called MapReduce, a distributed data processing system that forms part of the core functionality of the Hadoop Ecosystem. Amazon, Google, Facebook, and many others use Hadoop to store and process their data.

Download Full-text

Impact of Big Data Analytics on People’s Health: Overview of Systematic Reviews and Recommendations for Future Studies

Journal of Medical Internet Research ◽

10.2196/27275 ◽

2021 ◽

Vol 23 (4) ◽

pp. e27275

Author(s):

Israel Júnior Borges do Nascimento ◽

Milena Soriano Marcolino ◽

Hebatullah Mohamed Abdulazeem ◽

Ishanka Weerasekara ◽

Natasha Azzopardi-Muscat ◽

...

Keyword(s):

Big Data ◽

Chronic Diseases ◽

Systematic Reviews ◽

Data Analytics ◽

Big Data Analytics ◽

Health Indicators ◽

High Accuracy ◽

Patient Data ◽

Suicide Mortality ◽

Cochrane Library

Background Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health. Objective The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health. Methods Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist. Results The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. “Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data. Conclusions Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes. Trial Registration International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048

Download Full-text

QoSComm: A Data Flow Allocation Strategy among SDN-Based Data Centers for IoT Big Data Analytics

Applied Sciences ◽

10.3390/app10217586 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7586

Author(s):

Jose E. Lozano-Rizk ◽

Juan I. Nieto-Hipolito ◽

Raul Rivera-Rodriguez ◽

Maria A. Cosio-Leon ◽

Mabel Vazquez-Briseño ◽

...

Keyword(s):

Communication Network ◽

Big Data ◽

Data Analytics ◽

Completion Time ◽

Data Centers ◽

Data Flow ◽

Data Transfer ◽

Big Data Analytics ◽

Optimization Method ◽

Distributed Data

When Internet of Things (IoT) big data analytics (BDA) require to transfer data streams among software defined network (SDN)-based distributed data centers, the data flow forwarding in the communication network is typically done by an SDN controller using a traditional shortest path algorithm or just considering bandwidth requirements by the applications. In BDA, this scheme could affect their performance resulting in a longer job completion time because additional metrics were not considered, such as end-to-end delay, jitter, and packet loss rate in the data transfer path. These metrics are quality of service (QoS) parameters in the communication network. This research proposes a solution called QoSComm, an SDN strategy to allocate QoS-based data flows for BDA running across distributed data centers to minimize their job completion time. QoSComm operates in two phases: (i) based on the current communication network conditions, it calculates the feasible paths for each data center using a multi-objective optimization method; (ii) it distributes the resultant paths among data centers configuring their openflow Switches (OFS) dynamically. Simulation results show that QoSComm can improve BDA job completion time by an average of 18%.

Download Full-text

A survey on bandwidth-aware geo-distributed frameworks for big-data analytics

Journal Of Big Data ◽

10.1186/s40537-021-00427-9 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mohammed Bergui ◽

Said Najah ◽

Nikola S. Nikolov

Keyword(s):

Big Data ◽

Data Analytics ◽

Cluster Computing ◽

Big Data Analytics ◽

Global Scale ◽

Distributed Data ◽

Multiple Data ◽

Geographical Distances ◽

Commercial Applications ◽

Data Centres

AbstractIn the era of global-scale services, organisations produce huge volumes of data, often distributed across multiple data centres, separated by vast geographical distances. While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running jobs across geo-distributed data centres. The necessity to utilise such infrastructure introduces new challenges in the data analytics process due to bandwidth limitations of the inter-data-centre communication. In this article, we discuss challenges and survey the latest geo-distributed big-data analytics frameworks and schedulers (based on MapReduce and Spark) with WAN-bandwidth awareness.

Download Full-text

Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2018.2813333 ◽

2020 ◽

Vol 7 (1) ◽

pp. 205-215 ◽

Cited By ~ 25

Author(s):

Chenhan Xu ◽

Kun Wang ◽

Peng Li ◽

Rui Xia ◽

Song Guo ◽

...

Keyword(s):

Big Data ◽

Renewable Energy ◽

Reinforcement Learning ◽

Data Analytics ◽

Data Centers ◽

Big Data Analytics ◽

Distributed Data ◽

Energy Aware

Download Full-text

Big Data Analytics in Medicine and Healthcare

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0030 ◽

2018 ◽

Vol 15 (3) ◽

Cited By ~ 38

Author(s):

Blagoj Ristevski ◽

Ming Chen

Keyword(s):

Big Data ◽

Data Analytics ◽

Data Privacy ◽

Big Data Analytics ◽

Heterogeneous Data ◽

Distributed Data ◽

Biomedical Data ◽

Privacy And Security ◽

Distributed Data Processing ◽

Data Processing Software

Abstract This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

Download Full-text