PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing

Xiaochen Yang; Ming Xu; Shaojing Fu; Yuchuan Luo

doi:10.3390/app9183695

PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing

Applied Sciences ◽

10.3390/app9183695 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3695

Author(s):

Xiaochen Yang ◽

Ming Xu ◽

Shaojing Fu ◽

Yuchuan Luo

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Original Data ◽

Critical Issue ◽

Negative Influence ◽

Privacy Preserving ◽

Mobile Sensing ◽

Data Sets ◽

Counting Problem ◽

Sensing Applications

Mobile sensing mines group information through sensing and aggregating users’ data. Among major mobile sensing applications, the distinct counting problem aiming to find the number of distinct elements in a data stream with repeated elements, is extremely important for avoiding waste of resources. Besides, the privacy protection of users is also a critical issue for aggregation security. However, it is a challenge to meet these two requirements simultaneously since normal privacy-preserving methods would have negative influence on the accuracy and efficiency of distinct counting. In this paper, we propose a Privacy-Preserving Distinct Counting scheme (PPDC) for mobile sensing. Through integrating the basic idea of homomorphic encryption into Flajolet-Martin (FM) sketch, PPDC allows an aggregator to conduct distinct counting over large-scale datasets without disrupting privacy of users. Moreover, PPDC supports various forms of sensing data, including camera images, location data, etc. PPDC expands each bit of the hashing values of users’ original data, FM sketch is thus enhanced for encryption to protect users’ privacy. We prove the security of PPDC under known-plaintext model. The theoretic and experimental results show that PPDC achieves high counting accuracy and practical efficiency with scalability over large-scale data sets.

Download Full-text

Dynamic spatio-temporal generation of large-scale synthetic gridded precipitation: with improved spatial coherence of extremes

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-019-01724-9 ◽

2019 ◽

Vol 34 (9) ◽

pp. 1369-1383 ◽

Cited By ~ 1

Author(s):

Dirk Diederen ◽

Ye Liu

Keyword(s):

Large Scale ◽

Spatial Coherence ◽

Original Data ◽

Return Level ◽

Data Sets ◽

Large Set ◽

Precipitation Data ◽

Data Set ◽

Spatio Temporal ◽

Synthetic Precipitation

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.

Download Full-text

Rational Political Man: A Synthesis of Economic and Social-Psychological Perspectives

American Political Science Review ◽

10.1017/s0003055400263223 ◽

1969 ◽

Vol 63 (4) ◽

pp. 1106-1119 ◽

Cited By ~ 35

Author(s):

Michael J. Shapiro

Keyword(s):

Voting Behavior ◽

Large Scale ◽

Data Gathering ◽

Original Data ◽

Data Sets ◽

Party Affiliation ◽

Theoretical Frameworks ◽

Psychological Variables ◽

The Individual ◽

Group Memberships

In recent years the welter of data accumulated on American voting behavior has been continually reanalyzed by social scientists interested in building theories of electoral choice. Most of the original data-gathering enterprises were guided by general theoretical frameworks which, for the most part, were not developed to a point where the ensuing analyses addressed themselves unambiguously to the overall conceptions by which they were guided. As a result much of our knowledge about voting behavior is in the form of generalizations about what social and psychological variables account for voting choices while we lack conceptual frameworks which systematically interrelate these generalizations and provide comprehensive and parsimonious explanation. If any one unifying conception has emerged from the original large scale studies it is that the average voter is irrational. This inference has been derived from a variety of empirical relationships coupled with varying conceptions of rationality.The more recent reanalyses of these data sets have been characterized by a theoretical sophistication that was lacking heretofore. One of these, a theory of the calculus of voting, has applied some formal rigor to the question of the rationality of the decision to vote, selected empirical equivalents of theoretical entities from survey data on national elections, and conducted a successful test of the theory. Unlike traditional approaches to the rationality question which infer the degree of rationality from quantities of information possessed or from correlates of decisions (background, party affiliation, group memberships, etc.), this investigation conceived of rationality in terms of the kind of calculus employed by the individual in deciding among alternatives (in this case whether or not to vote).

Download Full-text

Secure Privacy Preserving Record Linkage of Large Databases by Modified Bloom Filter Encodings

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.29 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 2

Author(s):

Rainer Schnell ◽

Christian Borgs

Keyword(s):

Record Linkage ◽

Large Scale ◽

Bloom Filter ◽

Privacy Preserving ◽

Error Rates ◽

Bloom Filters ◽

Data Sets ◽

Research Subjects ◽

Practical Applications ◽

Large Databases

ABSTRACTObjectiveIn most European settings, record linkage across different institutions has to be based on personal identifiers such as names, birthday or place of birth. To protect the privacy of research subjects, the identifiers have to be encrypted. In practice, these identifiers show error rates up to 20% per identifier, therefore linking on encrypted identifiers usually implies the loss of large subsets of the databases. In many applications, this loss of cases is related to variables of interest for the subject matter of the study. Therefore, this kind of record-linkage will generate biased estimates. These problems gave rise to techniques of Privacy Preserving Record Linkage (PPRL). Many different PPRL techniques have been suggested within the last 10 years, very few of them are suitable for practical applications with large database containing millions of records as they are typical for administrative or medical databases. One proven technique for PPRL for large scale applications is PPRL based on Bloom filters.MethodUsing appropriate parameter settings, Bloom filter approaches show linkage results comparable to linkage based on unencrypted identifiers. Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records. By the application of suitable blocking strategies, linking can be done in reasonable time.ResultHowever, Bloom filters have been subject of cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a nonzero re-identification risk. We will present new results on recently developed techniques to defy all known attacks on PPRL Bloom filters. These computationally simple algorithms modify the identifiers by different cryptographic diffusion techniques. The presentation will demonstrate these new algorithms and show their performance concerning precision, recall and re-identification risk on large databases.

Download Full-text

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/571 ◽

2019 ◽

Author(s):

Quanming Yao ◽

Xiawei Guo ◽

James Kwok ◽

Weiwei Tu ◽

Yuqiang Chen ◽

...

Keyword(s):

Transfer Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Data Sets ◽

Data Set ◽

Predicting Performance ◽

Empirical Performance ◽

Feature Based ◽

Diabetes Prediction

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.

Download Full-text

Emergent Technologies in Big Data Sensing: A Survey

International Journal of Distributed Sensor Networks ◽

10.1155/2015/902982 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Ting Zhu ◽

Sheng Xiao ◽

Qingquan Zhang ◽

Yu Gu ◽

Ping Yi ◽

...

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Mobile Sensing ◽

Emergent Technologies ◽

Sensing Applications ◽

Crowd Sensing ◽

Multiple Data ◽

Challenges And Opportunities ◽

Research Architecture

When the number of data generating sensors increases and the amount of sensing data grows to a scale that traditional methods cannot handle, big data methods are needed for sensing applications. However, big data is a fuzzy data science concept and there is no existing research architecture for it nor a generic application structure in the field of sensing. In this survey, we explore many scattered results that have been achieved by combining big data techniques with sensing and present our vision of big data in sensing. Firstly, we outline the application categories to generally summarize existing research achievements. Then we discuss the techniques proposed in these studies to demonstrate challenges and opportunities in this field. Finally, we present research trends and list some directions of big data in future sensing. Overall, mobile sensing and its related studies are hot topics, but other large-scale sensing researches are flourishing too. Although there are no “big data” techniques acting as research platforms or infrastructures to support various applications, multiple data science technologies, such as data mining, crowd sensing, and cloud computing, serve as foundations and bases of big data in the world of sensing.

Download Full-text

Privacy-preserving constrained spectral clustering algorithm for large-scale data sets

IET Information Security ◽

10.1049/iet-ifs.2019.0255 ◽

2020 ◽

Vol 14 (3) ◽

pp. 321-331 ◽

Cited By ~ 1

Author(s):

Ji Li ◽

Jianghong Wei ◽

Mao Ye ◽

Wenfen Liu ◽

Xuexian Hu

Keyword(s):

Spectral Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Data Sets ◽

Large Scale Data ◽

Spectral Clustering Algorithm ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Adaptive Hashing with Sparse Modification for Scalable Image Retrieval

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417540118 ◽

2017 ◽

Vol 31 (06) ◽

pp. 1754011

Author(s):

Lifang Zhang ◽

Qi Shen ◽

Defang Li ◽

Guocan Feng ◽

Xin Tang ◽

...

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Sparse Matrix ◽

Hash Functions ◽

Original Data ◽

Data Sets ◽

Compact Binary ◽

Large Scale Data ◽

Scale Data ◽

Ann Search

Approximate Nearest Neighbor (ANN) search is a challenging problem with the explosive high-dimensional large-scale data in recent years. The promising technique for ANN search include hashing methods which generate compact binary codes by designing effective hash functions. However, lack of an optimal regularization is the key limitation of most of the existing hash functions. To this end, a new method called Adaptive Hashing with Sparse Modification (AHSM) is proposed. In AHSM, codes consist of vertices on the hypercube and the projection matrix is divided into two separate matrices. Data is rotated through a orthogonal matrix first and modified by a sparse matrix. Here the sparse matrix needs to be learned as a regularization item of hash function which is used to avoid overfitting and reduce quantization distortion. Totally, AHSM has two advantages: improvement of the accuracy without any time cost increasement. Furthermore, we extend AHSM to a supervised version, called Supervised Adaptive Hashing with Sparse Modification (SAHSM), by introducing Canonical Correlation Analysis (CCA) to the original data. Experiments show that the AHSM method stably surpasses several state-of-the-art hashing methods on four data sets. And at the same time, we compare three unsupervised hashing methods with their corresponding supervised version (including SAHSM) on three data sets with labels known. Similarly, SAHSM outperforms other methods on most of the hash bits.

Download Full-text

Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases

Information ◽

10.3390/info10030119 ◽

2019 ◽

Vol 10 (3) ◽

pp. 119 ◽

Cited By ~ 1

Author(s):

Mahboob Qaosar ◽

Asif Zaman ◽

Md. Siddique ◽

Annisa ◽

Yasuhiko Morimoto

Keyword(s):

Big Data ◽

Large Scale ◽

Homomorphic Encryption ◽

Database Systems ◽

Privacy Preserving ◽

Secure Computation ◽

Sensitive Information ◽

Computing Environment ◽

Skyline Query ◽

Attribute Value

Selecting representative objects from a large-scale database is an essential task to understand the database. A skyline query is one of the popular methods for selecting representative objects. It retrieves a set of non-dominated objects. In this paper, we consider a distributed algorithm for computing skyline, which is efficient enough to handle “big data”. We have noticed the importance of “big data” and want to use it. On the other hand, we must take care of its privacy. In conventional distributed algorithms for computing a skyline query, we must disclose the sensitive values of each object of a private database to another for comparison. Therefore, the privacy of the objects is not preserved. However, such disclosures of sensitive information in conventional distributed database systems are not allowed in the modern privacy-aware computing environment. Recently several privacy-preserving skyline computation frameworks have been introduced. However, most of them use computationally expensive secure comparison protocol for comparing homomorphically encrypted data. In this work, we propose a novel and efficient approach for computing the skyline in a secure multi-party computing environment without disclosing the individual attributes’ value of the objects. We use a secure multi-party sorting protocol that uses the homomorphic encryption in the semi-honest adversary model for transforming each attribute value of the objects without changing their order on each attribute. To compute skyline we use the order of the objects on each attribute for comparing the dominance relationship among the objects. The security analysis confirms that the proposed framework can achieve multi-party skyline computation without leaking the sensitive attribute value to others. Besides that, our experimental results also validate the effectiveness and scalability of the proposed privacy-preserving skyline computation framework.

Download Full-text

Achieving Lightweight Verifiable Privacy Preserving Search Over Encrypted Data

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.3.3.267 ◽

2019 ◽

Vol 3 (3) ◽

Author(s):

Selasi Kwame Ocansey ◽

Charles Fynn Oduro

Keyword(s):

Service Provider ◽

Homomorphic Encryption ◽

Bloom Filter ◽

Cloud Service ◽

Privacy Preserving ◽

Data Sets ◽

Cloud Service Provider ◽

Encrypted Data ◽

Security Proofs

When cloud clients outsource their database to the cloud, they entrust management operations to a cloud service provider who is expected to answer the client’s queries on the cloud where database is located. Efficient techniques can ensure critical requirements for outsourced data’s integrity and authenticity. A lightweight privacy preserving verifiable scheme for outsourcingdatabase securely is proposed, our scheme encrypts data before outsourcing and returned query results are verified with parameters of correctness and completeness. Our scheme is projected on lightweight homomorphic encryption technique and bloom filter which are efficiently authenticated to guarantee the outsourced database’s integrity, authenticity, and confidentiality. An ordering challenge technique is proposed for verifying top-k query results. We conclude by detailing our analysis of security proofs, privacy, verifiability and the performance efficiency of our scheme. Our proposed scheme’s proof and evaluation analysis show its security and efficiency for practical deployment. We also evaluate our scheme’s performances over two UCI data sets.

Download Full-text

Symbiotic Sensing for Energy-Intensive Tasks in Large-Scale Mobile Sensing Applications

Sensors ◽

10.3390/s17122763 ◽

2017 ◽

Vol 17 (12) ◽

pp. 2763 ◽

Cited By ~ 2

Author(s):

Duc Le ◽

Thuong Nguyen ◽

Hans Scholten ◽

Paul Havinga

Keyword(s):

Large Scale ◽

Mobile Sensing ◽

Sensing Applications

Download Full-text