Efficient Data Deduplication Mechanism for Genomic Data

Tin Thein Thwel .; G R Sinha .

doi:10.30732/ijbbb.20190402004

Efficient Data Deduplication Mechanism for Genomic Data

CSVTU International Journal of Biotechnology Bioinformatics and Biomedical ◽

10.30732/ijbbb.20190402004 ◽

2019 ◽

Vol 4 (2) ◽

pp. 52-58

Author(s):

Tin Thein Thwel . ◽

G R Sinha .

Keyword(s):

Data Storage ◽

Data Science ◽

Genomic Data ◽

Integrated Approach ◽

Human Beings ◽

Data Deduplication ◽

Huge Amount ◽

Data Set ◽

Single Instance ◽

Efficient Data

During the data science age, many people tend to access health concerned information and diagnosis using information technology, including telemedicine. Therefore, many researchers attempting to work with medical experts as well as bioinformatics area. In the bioinformatics field, handling the genomic data of human beings becomes essential such as collecting, storing and processing. Genomic data refers to the genome and DNA data of an organism. Unavoidably, genomic data require huge amount of storage for the customized software to analyze. Recently, genome researchers are rising the alarms over big data.This research papers attempts in significant amount of reduction of data storage by applying data deduplication process in genomic data set. Data deduplication, ‘dedupe’ in short can reduce the amount of storage because of its single instance storage nature.Therefore, data deduplication becomes one of the solutions for optimizing the huge amount of storage spaces for genome storage.We have implemented data deduplication method and applied it to genomic data and the deduplication performed successfully by using secure hash algorithm, B++ tree and sub-file level chunking algorithm. The methods were implemented in integrated approach. The files are separated into different chunks with the help of Two Threshold Two Divisors algorithm and hash function is used to get chunk identifiers. Indexing keys are constructed using the identifiersin B+ tree like index structure.Thissystem can reduce the storage space significantly when there exist duplicated data. The preliminary testing is made using NCBI datasets

Download Full-text

A Cloud Game-Based Educative Platform Architecture: The CyberScratch Project

Applied Sciences ◽

10.3390/app11020807 ◽

2021 ◽

Vol 11 (2) ◽

pp. 807

Author(s):

Llanos Tobarra ◽

Alejandro Utrilla ◽

Antonio Robles-Gómez ◽

Rafael Pastor-Vargas ◽

Roberto Hernández

Keyword(s):

Data Storage ◽

Data Privacy ◽

Data Science ◽

Critical Issue ◽

Privacy Management ◽

Instructional Process ◽

Legal Context ◽

Flexible Architecture ◽

Efficient Data ◽

Implementation Guidelines

The employment of modern technologies is widespread in our society, so the inclusion of practical activities for education has become essential and useful at the same time. These activities are more noticeable in Engineering, in areas such as cybersecurity, data science, artificial intelligence, etc. Additionally, these activities acquire even more relevance with a distance education methodology, as our case is. The inclusion of these practical activities has clear advantages, such as (1) promoting critical thinking and (2) improving students’ abilities and skills for their professional careers. There are several options, such as the use of remote and virtual laboratories, virtual reality and game-based platforms, among others. This work addresses the development of a new cloud game-based educational platform, which defines a modular and flexible architecture (using light containers). This architecture provides interactive and monitoring services and data storage in a transparent way. The platform uses gamification to integrate the game as part of the instructional process. The CyberScratch project is a particular implementation of this architecture focused on cybersecurity game-based activities. The data privacy management is a critical issue for these kinds of platforms, so the architecture is designed with this feature integrated in the platform components. To achieve this goal, we first focus on all the privacy aspects for the data generated by our cloud game-based platform, by considering the European legal context for data privacy following GDPR and ISO/IEC TR 20748-1:2016 recommendations for Learning Analytics (LA). Our second objective is to provide implementation guidelines for efficient data privacy management for our cloud game-based educative platform. All these contributions are not found in current related works. The CyberScratch project, which was approved by UNED for the year 2020, considers using the xAPI standard for data handling and services for the game editor, game engine and game monitor modules of CyberScratch. Therefore, apart from considering GDPR privacy and LA recommendations, our cloud game-based architecture covers all phases from game creation to the final users’ interactions with the game.

Download Full-text

STRENGTHNING THE PRODUCTIVITY OF STORAGE FOR BIG DATA STORAGE SYSTEMS USING DISTRIBUTED DEDUPLICATION

International Journal For Innovative Engineering and Management Research ◽

10.48047/ijiemr/v09/i12/114 ◽

2020 ◽

pp. 691-694

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Service Providers ◽

Cloud Service ◽

Similar Data ◽

Data Deduplication ◽

Data Set ◽

Recovery Algorithm ◽

Network Bandwidth ◽

File Access

Cloud storage is one of the key features of cloud computing, which helps cloud users outsource large numbers of data without upgrading their devices. However, Cloud Service Providers (CSPs) data storage faces problems with data redundancy. The data deduplication technique aims at eliminating redundant information segments and maintains one single instance of the data set, even if any number of users own similar data set. Since blocks of data are spread on many servers, each block of the file has to be downloaded before restoring the file to decrease system output. We suggest a cloud storage server data recovery module to improve file access efficiency and reduce time spent on network bandwidth. Device coding is used in the suggested method to store blocks in distributed cloud storage, and for data integrity, MD5 (Message Digest 5) is used. Running recovery algorithm helps the user to retrieve a file directly from the cloud servers without downloading every block. The scheme proposed improves system time efficiency and the ability to access the stored data quickly. This reduces bandwidth consumption and reduces overhead user processing while downloading the data file.

Download Full-text

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)

10.1101/2021.04.22.436044 ◽

2021 ◽

Author(s):

Michael C. Schatz ◽

Anthony A. Philippakis ◽

Enis Afgan ◽

Eric Banks ◽

Vincent J. Carey ◽

...

Keyword(s):

Data Sharing ◽

Data Storage ◽

Data Science ◽

Genomic Data ◽

Threat Detection ◽

Computing Environment ◽

Cloud Computing Environment ◽

Data Movement ◽

Genomic Data Analysis ◽

Genomic Data Science

AbstractThe traditional model of genomic data analysis - downloading data from centralized warehouses for analysis with local computing resources - is increasingly unsustainable. Not only are transfers slow and cost prohibitive, but this approach also leads to redundant and siloed compute infrastructure that makes it difficult to ensure security and compliance of protected data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) inverts this model, providing a unified cloud computing environment for data storage, management, and analysis. AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides scalable, shared computing resources that can be acquired by researchers as needed. This presents many new opportunities for collaboration and data sharing that will ultimately lead to scientific discoveries at scales not previously possible.

Download Full-text

Efficient Data Deduplication for Big Data Storage Systems

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-13-0224-4_32 ◽

2018 ◽

pp. 351-371 ◽

Cited By ~ 2

Author(s):

Naresh Kumar ◽

Shobha ◽

S. C. Jain

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Data Deduplication ◽

Efficient Data ◽

Big Data Storage

Download Full-text

Implementation of Efficient Data Storage Management Using Common Corporate Social Responsibility System

SSRN Electronic Journal ◽

10.2139/ssrn.3559265 ◽

2020 ◽

Author(s):

Nikita Gupta ◽

Mansi Mishra ◽

Neha Kumari

Keyword(s):

Corporate Social Responsibility ◽

Social Responsibility ◽

Data Storage ◽

Storage Management ◽

Efficient Data ◽

Corporate Social ◽

Responsibility System

Download Full-text

An Efficient Approach for Detection and Verification of Infected Cases of Covid-19 Virus through Block Chain: A Survey

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999201210145558 ◽

2020 ◽

Vol 10 ◽

Author(s):

Kanika Gupta ◽

Aatif Jamshed

Keyword(s):

Data Storage ◽

Operations Management ◽

World Health ◽

Chain Model ◽

Human Beings ◽

Healthcare Applications ◽

Food Market ◽

Block Chain ◽

Novel Coronavirus ◽

Health Organization

: Some unknown cases of pneumonia were communicated to World Health Organization (WHO) on 31 December,2019 in China’s Wuhan state. The higher authorities of China informed novel coronavirus as the root cause and labelled as “nCov-2019”. This virus is lying into the virus’s family which propagates the diseases like cold flu, lungs infection and more serious diseases. It is not detected earlier in human beings as it is considered to be a new patch on life. Many countries have increased their surveillance forces around the globe to detect any new novel coronavirus cases. An efficient and safe network for secure data storage i.e. Block chain is used in several applications such as food market, healthcare applications, finance, operations management, Internet of Things (IoT). In this paper, with the use of this emerging technology, are able to track useful information and accelerate the treatment process of patients. It also preserves the person’s identity. Correct implementation of block chain model has the chances to restrict the coronavirus transmissions and its related mortality rate where there are inadequate facilities of testing. Other infectious diseases will also be curbed by this model. The advantages of this model can reach to various stakeholders who are involved in the healthcare field which helps us to restrict the transmission of various diseases.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Impact of Big Data over Telecom Industry

Pakistan Journal of Engineering Technology & Science ◽

10.22555/pjets.v6i2.1958 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Muhammad Waqar Khan ◽

Muhammad Asghar Khan ◽

Muhammad Alam ◽

Wajahat Ali

Keyword(s):

Big Data ◽

Data Science ◽

Cell Phones ◽

Smart Phones ◽

World Population ◽

Huge Amount ◽

Scale Down ◽

Telecom Industry ◽

Telecom Sector ◽

Theoretical Computing

During past few years, data is growing exponentially attracting researchers to work a popular term, the Big Data. Big Data is observed in various fields, such as information technology, telecommunication, theoretical computing, mathematics, data mining and data warehousing. Data science is frequently referred with Big Data as it uses methods to scale down the Big Data. Currently more than 3.2 billion of the world population is connected to internet out of which 46% are connected via smart phones. Over 5.5 billion people are using cell phones. As technology is rapidly shifting from ordinary cell phones towards smart phones, therefore proportion of using internet is also growing. There is a forecast that by 2020 around 7 billion people at the globe will be using internet out of which 52% will be using their smart phones to connect. In year 2050 that figure will be touching 95% of world population. Every device connect to internet generates data. As majority of the devices are using smart phones to generate this data by using applications such as Instagram, WhatsApp, Apple, Google, Google+, Twitter, Flickr etc., therefore this huge amount of data is becoming a big threat for telecom sector. This paper is giving a comparison of amount of Big Data generated by telecom industry. Based on the collected data we use forecasting tools to predict the amount of Big Data will be generated in future and also identify threats that telecom industry will be facing from that huge amount of Big Data.

Download Full-text

Live Demonstration: Energy-Efficient Data Symbol Detection via Boosted Learning for Multi-Actuator Data Storage Systems

2021 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas51556.2021.9401491 ◽

2021 ◽

Author(s):

Jiachen Xu ◽

Ethan Chen ◽

Vanessa Chen

Keyword(s):

Data Storage ◽

Energy Efficient ◽

Storage Systems ◽

Symbol Detection ◽

Live Demonstration ◽

Efficient Data ◽

Data Symbol

Download Full-text

The “Intrinsic Value” of Cultural Heritage as Driver for Circular Human-Centered Adaptive Reuse

Sustainability ◽

10.3390/su13063231 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3231

Author(s):

Luigi Fusco Girard ◽

Marilena Vecco

Keyword(s):

Cultural Heritage ◽

Intrinsic Value ◽

Investment Decision ◽

Adaptive Reuse ◽

Living Organism ◽

Integrated Approach ◽

Human Beings ◽

Model Concept ◽

Investment Decision Making ◽

Instrumental Values

By referring to the European Green Deal, this paper analyzes the “intrinsic value” of cultural heritage by investigating the human-centered adaptive reuse of this heritage. This implies questions such as how to improve the effectiveness of reuse, restoration, and valorization interventions on cultural heritage/landscapes and how to transform a cultural asset into a place, interpreted as a living ecosystem, to be managed as a living organism. The autopoietic characteristic of the eco-bio-systems, specifically focusing on the intrinsic versus instrumental values of cultural heritage ecosystem is discussed in detail. Specifically, the notion of complex social value is introduced to express the above integration. In ecology, the notion of intrinsic value (or “primary value”) relates to the recognition of a value that “pre-exists” any exploitation by human beings. The effectiveness of transforming a heritage asset into a living ecosystem is seen to follow from an integration of these two values. In this context, the paper provides an overview of the different applications of the business model concept in the circular economy, for a better investment decision-making and management in heritage adaptive reuse. Matera case is presented as an example of a cultural heritage ecosystem. To conclude, recommendations toward an integrated approach in managing the adaptive reuse of heritage ecosystem as a living organism are proposed.

Download Full-text