A Global Survey on Data Deduplication

Shubhanshi Singhal; Pooja Sharma; Rajesh Kumar Aggarwal; Vishal Passricha

doi:10.4018/ijghpc.2018100103

Privacy preserving proof of ownership for data in cloud storage systems

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10317 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 13

Author(s):

B Tirapathi Reddy ◽

M V. P. Chandra Sekhara Rao

Keyword(s):

False Positive ◽

Cloud Storage ◽

Storage Systems ◽

False Positive Rate ◽

Data Deduplication ◽

Storage Devices ◽

Redundant Data ◽

Proof Of Ownership ◽

Positive Rate ◽

Abundant Data

Storing data in cloud has become a necessity as users are accumulating abundant data every day and they are running out of physical storage devices. But majority of the data in the cloud storage is redundant. Data deduplication using convergent key encryption has been the mechanism popularly used to eliminate redundant data items in the cloud storage. Convergent key encryption suffers from various drawbacks. For instance, if data items are deduplicated based on convergent key, any unauthorized user can compromise the cloud storage by simply having a guessed hash of the file. So, ensuring the ownership of the data items is essential to protect the data items. As cukoo filter offers the minimum false positive rate, with minimal space overhead our mechanism has provided the proof of ownership.

Download Full-text

Secure Data Deduplication of Encrypted Data in Cloud

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch040 ◽

2021 ◽

pp. 873-889

Author(s):

Sumit Kumar Mahana ◽

Rajesh Kumar Aggarwal

Keyword(s):

Cloud Storage ◽

Storage Systems ◽

Optimization Strategy ◽

Data Deduplication ◽

Time Data ◽

Redundant Data ◽

Security Parameter ◽

Data Content ◽

Storage Optimization ◽

Existing Data

In the present digital scenario, data is of prime significance for individuals and moreover for organizations. With the passage of time, data content being produced increases exponentially, which poses a serious concern as the huge amount of redundant data contents stored on the cloud employs a severe load on the cloud storage systems itself which cannot be accepted. Therefore, a storage optimization strategy is a fundamental prerequisite to cloud storage systems. Data deduplication is a storage optimization strategy that is used for deleting identical copies of redundant data, optimizing bandwidth, improves utilization of storage space, and hence, minimizes storage cost. To guarantee the security parameter, the data which is stored on the cloud must be in an encrypted form to ensure the security of the stored data. Consequently, executing deduplication safely over the encrypted information in the cloud seems to be a challenging job. This chapter discusses various existing data deduplication techniques with a notion of securing the data on the cloud that addresses this challenge.

Download Full-text

Data Deduplication Techniques for Big Data Storage Systems

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9129.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1145-1150 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Optimization Technique ◽

Optimization Techniques ◽

Digital Data ◽

Data Deduplication ◽

Redundant Data ◽

And Performance ◽

Big Data Storage

The enormous growth of digital data, especially the data in unstructured format has brought a tremendous challenge on data analysis as well as the data storage systems which are essentially increasing the cost and performance of the backup systems. The traditional systems do not provide any optimization techniques to keep the duplicated data from being backed up. Deduplication of data has become an essential and financial way of the capacity optimization technique which replaces the redundant data. The following paper reviews the deduplication process, types of deduplication and techniques available for data deduplication. Also, many approaches proposed by various researchers on deduplication in Big data storage systems are studied and compared.

Download Full-text

Secure Data Deduplication of Encrypted Data in Cloud

Advances in Wireless Technologies and Telecommunication - Handbook of Research on the IoT, Cloud Computing, and Wireless Network Optimization ◽

10.4018/978-1-5225-7335-7.ch010 ◽

2019 ◽

pp. 196-212

Author(s):

Sumit Kumar Mahana ◽

Rajesh Kumar Aggarwal

Keyword(s):

Cloud Storage ◽

Storage Systems ◽

Optimization Strategy ◽

Data Deduplication ◽

Time Data ◽

Redundant Data ◽

Security Parameter ◽

Data Content ◽

Storage Optimization ◽

Existing Data

In the present digital scenario, data is of prime significance for individuals and moreover for organizations. With the passage of time, data content being produced increases exponentially, which poses a serious concern as the huge amount of redundant data contents stored on the cloud employs a severe load on the cloud storage systems itself which cannot be accepted. Therefore, a storage optimization strategy is a fundamental prerequisite to cloud storage systems. Data deduplication is a storage optimization strategy that is used for deleting identical copies of redundant data, optimizing bandwidth, improves utilization of storage space, and hence, minimizes storage cost. To guarantee the security parameter, the data which is stored on the cloud must be in an encrypted form to ensure the security of the stored data. Consequently, executing deduplication safely over the encrypted information in the cloud seems to be a challenging job. This chapter discusses various existing data deduplication techniques with a notion of securing the data on the cloud that addresses this challenge.

Download Full-text

A Novel approach of data deduplication for distributed storage

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.4.10040 ◽

2018 ◽

Vol 7 (2.4) ◽

pp. 46 ◽

Cited By ~ 2

Author(s):

Shubhanshi Singhal ◽

Akanksha Kaushik ◽

Pooja Sharma

Keyword(s):

Distributed Storage ◽

Search Time ◽

Search Tree ◽

Digital Data ◽

Binary Search Tree ◽

Data Deduplication ◽

Novel Approach ◽

Best Value ◽

Chunk Size ◽

Data Files

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.

Download Full-text

Instance Selection

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch161 ◽

2011 ◽

pp. 1041-1045

Author(s):

Huan Liu

Keyword(s):

Data Mining ◽

Data Storage ◽

Data Reduction ◽

Effective Means ◽

Digital Data ◽

Data Selection ◽

Instance Selection ◽

Redundant Data ◽

Performance Deterioration ◽

Digital Data Storage

The amounts of data become increasingly large in recent years as the capacity of digital data storage worldwide has significantly increased. As the size of data grows, the demand for data reduction increases for effective data mining. Instance selection is one of the effective means to data reduction. This article introduces basic concepts of instance selection, its context, necessity and functionality. It briefly reviews the state-of-the-art methods for instance selection. Selection is a necessity in the world surrounding us. It stems from the sheer fact of limited resources. No exception for data mining. Many factors give rise to data selection: data is not purely collected for data mining or for one particular application; there are missing data, redundant data, and errors during collection and storage; and data can be too overwhelming to handle. Instance selection is one effective approach to data selection. It is a process of choosing a subset of data to achieve the original purpose of a data mining application. The ideal outcome of instance selection is a model independent, minimum sample of data that can accomplish tasks with little or no performance deterioration.

Download Full-text

Secure Deduplication Scheme for Cloud Encrypted Data

International Journal of Advanced Pervasive and Ubiquitous Computing ◽

10.4018/ijapuc.2019040103 ◽

2019 ◽

Vol 11 (2) ◽

pp. 27-40

Author(s):

Vishal Passricha ◽

Ashish Chopra ◽

Shubhanshi Singhal

Keyword(s):

Low Cost ◽

Digital Data ◽

Network Storage ◽

Data Deduplication ◽

Dynamic Nature ◽

Encrypted Data ◽

Cloud Data ◽

Redundant Data ◽

Proof Of Ownership ◽

Data Reduction Technique

Cloud storage (CS) is gaining much popularity nowadays because it offers low-cost and convenient network storage services. In this big data era, the explosive growth in digital data moves the users towards CS but this causes a lot of storage pressure on CS systems because a large volume of this data is redundant. Data deduplication is an effective data reduction technique. The dynamic nature of data makes security and ownership of data as a very important issue. Proof-of-ownership schemes are a robust way to check the ownership claimed by any owner. However, this method affects the deduplication process because encryption methods have varying characteristics. A convergent encryption (CE) scheme is widely used for secure data deduplication. The problem with the CE-based scheme is that the user can decrypt the cloud data while he has lost his ownership. This article addresses the problem of ownership revocation by proposing a secure deduplication scheme for encrypted data. The proposed scheme enhances the security against unauthorized encryption and poison attack on the predicted set of data.

Download Full-text

Study on Cloud Storage System Based on Distributed Storage Systems

2010 International Conference on Computational and Information Sciences ◽

10.1109/iccis.2010.351 ◽

2010 ◽

Cited By ~ 19

Author(s):

Qinlu He ◽

Zhanhuai Li ◽

Xiao Zhang

Keyword(s):

Cloud Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Distributed Storage Systems

Download Full-text

A Study of Practical Proxy Reencryption with a Keyword Search Scheme considering Cloud Storage Structure

The Scientific World JOURNAL ◽

10.1155/2014/615679 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10

Author(s):

Sun-Ho Lee ◽

Im-Yeong Lee

Keyword(s):

Cloud Storage ◽

Keyword Search ◽

Storage Systems ◽

Searchable Encryption ◽

Digital Information ◽

Storage Medium ◽

Data Outsourcing ◽

Collusion Attacks ◽

Storage Limit ◽

Local Storage

Data outsourcing services have emerged with the increasing use of digital information. They can be used to store data from various devices via networks that are easy to access. Unlike existing removable storage systems, storage outsourcing is available to many users because it has no storage limit and does not require a local storage medium. However, the reliability of storage outsourcing has become an important topic because many users employ it to store large volumes of data. To protect against unethical administrators and attackers, a variety of cryptography systems are used, such as searchable encryption and proxy reencryption. However, existing searchable encryption technology is inconvenient for use in storage outsourcing environments where users upload their data to be shared with others as necessary. In addition, some existing schemes are vulnerable to collusion attacks and have computing cost inefficiencies. In this paper, we analyze existing proxy re-encryption with keyword search.

Download Full-text

Commercial and Distributed Storage Systems

Data Intensive Storage Services for Cloud Environments ◽

10.4018/978-1-4666-3934-8.ch001 ◽

2013 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Spyridon V. Gogouvitis ◽

Athanasios Voulodimos ◽

Dimosthenis Kyriazis

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Storage Systems ◽

Distributed Storage ◽

End Users ◽

Digital Environment ◽

Storage Solutions ◽

Distributed Storage Systems ◽

Promising Solution ◽

New Generation

Distributed storage systems are becoming the method of data storage for the new generation of applications, as it appears a promising solution to handle the immense volume of data produced in today’s rich and ubiquitous digital environment. In this chapter, the authors first present the requirements end users pose on Cloud Storage solutions. Then they compare some of the most prominent commercial distributed storage systems against these requirements. Lastly, the authors present the innovations the VISION Cloud project brings in the field of Storage Clouds.

Download Full-text