Utility Optimization of Federated Learning with Differential Privacy

Discrete Dynamics in Nature and Society ◽

10.1155/2021/3344862 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Jianzhe Zhao ◽

Keming Mao ◽

Chenxi Huang ◽

Yuyang Zeng

Keyword(s):

Differential Privacy ◽

Learning Algorithm ◽

Wasserstein Distance ◽

Dynamic Allocation ◽

Model Accuracy ◽

Learning Framework ◽

Utility Optimization ◽

Data Heterogeneity ◽

Cross Platform ◽

The Impact

Secure and trusted cross-platform knowledge sharing is significant for modern intelligent data analysis. To address the trade-off problems between privacy and utility in complex federated learning, a novel differentially private federated learning framework is proposed. First, the impact of data heterogeneity of participants on global model accuracy is analyzed quantitatively based on 1-Wasserstein distance. Then, we design a multilevel and multiparticipant dynamic allocation method of privacy budget to reduce the injected noise, and the utility can be improved efficiently. Finally, they are integrated, and a novel adaptive differentially private federated learning algorithm (A-DPFL) is designed. Comprehensive experiments on redefined non-I.I.D MNIST and CIFAR-10 datasets are conducted, and the results demonstrate the superiority of model accuracy, convergence, and robustness.

Download Full-text

Face Image Publication Based on Differential Privacy

Wireless Communications and Mobile Computing ◽

10.1155/2021/6680701 ◽

2021 ◽

Vol 2021 ◽

pp. 1-20

Author(s):

Chao Liu ◽

Jing Yang ◽

Weinan Zhao ◽

Yining Zhang ◽

Jingyou Li ◽

...

Keyword(s):

Privacy Protection ◽

Data Stream ◽

Differential Privacy ◽

Sliding Window ◽

Data Encryption ◽

Face Image ◽

Dynamic Allocation ◽

Face Images ◽

Privacy Budget ◽

The Impact

As an information carrier, face images contain abundant sensitive information. Due to its natural weak privacy, direct publishing may divulge privacy. Anonymization Technology and Data Encryption Technology are limited by the background knowledge and attack means of attackers, which cannot completely content the needs of face image privacy protection. Therefore, this paper proposes a face image publishing SWP (sliding window publication) algorithm, which satisfies the differential privacy. Firstly, the SWP translates the image gray matrix into a one-dimensional ordered data stream by using image segmentation technology. The purpose of this step is to transform the image privacy protection problem into the data stream privacy protection problem. Then, the sliding window model is used to model the data flow. By comparing the similarity of data in adjacent sliding windows, the privacy budget is dynamically allocated, and Laplace noise is added. In SWP, the data in the sliding window comes from the image. To present the image features contained in the data more comprehensively and use the privacy budget more reasonably, this paper proposes a fusion similarity measurement EM (exact mechanism) mechanism and a dynamic privacy budget allocation DA (dynamic allocation) mechanism. Also, for further improving the usability of human face images and reducing the impact of noise, a sort-SWP algorithm based on the SWP method is proposed in the paper. Through the analysis, it can be seen that ordered input can further improve the usability of the SWP algorithm, but direct sorting of data will destroy the ε -differential privacy. Therefore, this paper proposes a sorting method-SAS method, which satisfies the ε -differential privacy; SAS obtain an initial sort by using an exponential mechanism firstly. And then an approximate correct sort is obtained by using the Annealing algorithm to optimize the initial sort. Compared with LAP algorithm and SWP algorithm, the average accuracy rate of sort-SWP algorithm in ORL, Yale is increased by 56.63% and 21.55%, the recall rate is increased by 6.85% and 3.32%, and F1-sroce is improved by 55.62% and 16.55%.

Download Full-text

Optimization of Rocky Desertification Classification Model Based on Vegetation Type and Seasonal Characteristic

Remote Sensing ◽

10.3390/rs13152935 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2935

Author(s):

Chunhua Qian ◽

Hequn Qiang ◽

Feng Wang ◽

Mingyang Li

Keyword(s):

Learning Algorithm ◽

Vegetation Type ◽

Research Area ◽

Classification Model ◽

Guizhou Province ◽

Model Accuracy ◽

Spatiotemporal Evolution ◽

Rocky Desertification ◽

The North ◽

Svm Model

Building a high-precision, stable, and universal automatic extraction model of the rocky desertification information is the premise for exploring the spatiotemporal evolution of rocky desertification. Taking Guizhou province as the research area and based on MODIS and continuous forest inventory data in China, we used a machine learning algorithm to build a rocky desertification model with bedrock exposure rate, temperature difference, humidity, and other characteristic factors and considered improving the model accuracy from the spatial and temporal dimensions. The results showed the following: (1) The supervised classification method was used to build a rocky desertification model, and the logical model, RF model, and SVM model were constructed separately. The accuracies of the models were 73.8%, 78.2%, and 80.6%, respectively, and the kappa coefficients were 0.61, 0.672, and 0.707, respectively. SVM performed the best. (2) Vegetation types and vegetation seasonal phases are closely related to rocky desertification. After combining them, the model accuracy and kappa coefficient improved to 91.1% and 0.861. (3) The spatial distribution characteristics of rocky desertification in Guizhou are obvious, showing a pattern of being heavy in the west, light in the east, heavy in the south, and light in the north. Rocky desertification has continuously increased from 2001 to 2019. In conclusion, combining the vertical spatial structure of vegetation and the differences in seasonal phase is an effective method to improve the modeling accuracy of rocky desertification, and the SVM model has the highest rocky desertification classification accuracy. The research results provide data support for exploring the spatiotemporal evolution pattern of rocky desertification in Guizhou.

Download Full-text

Multivariate Analysis as a Tool for Quantification of Conformational Transitions in DNA Thin Films

Applied Sciences ◽

10.3390/app11135895 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5895

Author(s):

Kristina Serec ◽

Sanja Dolanski Babić

Keyword(s):

Thin Films ◽

Learning Algorithm ◽

Principal Component Regression ◽

Principal Component ◽

Conformational Transitions ◽

Cancer Diagnostics ◽

Dna Conformation ◽

Support Vector ◽

Multivariate Statistical ◽

The Impact

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.

Download Full-text

Regression Analysis of Predictions and Forecasts of Cloud Data Centre KPIs Using the Boosted Decision Tree Algorithm

10.36227/techrxiv.14538486 ◽

2021 ◽

Author(s):

Thomas Weripuo Gyeera

Keyword(s):

Decision Tree ◽

Heterogeneous Computing ◽

Learning Algorithm ◽

High Volume ◽

Dynamic Allocation ◽

Test Bed ◽

Cloud Data ◽

On Demand ◽

Boosted Decision Tree ◽

Data Centres

<div>The National Institute of Standards and Technology defines the fundamental characteristics of cloud computing as: on-demand computing, offered via the network, using pooled resources, with rapid elastic scaling and metered charging. The rapid dynamic allocation and release of resources on demand to meet heterogeneous computing needs is particularly challenging for data centres, which process a huge amount of data characterised by its high volume, velocity, variety and veracity (4Vs model). Data centres seek to regulate this by monitoring and adaptation, typically reacting to service failures after the fact. We present a real cloud test bed with the capabilities of proactively monitoring and gathering cloud resource information for making predictions and forecasts. This contrasts with the state-of-the-art reactive monitoring of cloud data centres. We argue that the behavioural patterns and Key Performance Indicators (KPIs) characterizing virtualized servers, networks, and database applications can best be studied and analysed with predictive models. Specifically, we applied the Boosted Decision Tree machine learning algorithm in making future predictions on the KPIs of a cloud server and virtual infrastructure network, yielding an R-Square of 0.9991 at a 0.2 learning rate. This predictive framework is beneficial for making short- and long-term predictions for cloud resources.</div>

Download Full-text

Data-Driven Localization Mappings in Filtering the Monsoon–Hadley Multicloud Convective Flows

Monthly Weather Review ◽

10.1175/mwr-d-17-0381.1 ◽

2018 ◽

Vol 146 (4) ◽

pp. 1197-1218

Author(s):

Michèle De La Chevrotière ◽

John Harlim

Keyword(s):

Learning Algorithm ◽

Data Driven ◽

Radiative Transfer Model ◽

Transfer Model ◽

Model Errors ◽

Accurate Analysis ◽

Perfect Model ◽

Nonlinear Stochastic Model ◽

Correlation Statistics ◽

The Impact

This paper demonstrates the efficacy of data-driven localization mappings for assimilating satellite-like observations in a dynamical system of intermediate complexity. In particular, a sparse network of synthetic brightness temperature measurements is simulated using an idealized radiative transfer model and assimilated to the monsoon–Hadley multicloud model, a nonlinear stochastic model containing several thousands of model coordinates. A serial ensemble Kalman filter is implemented in which the empirical correlation statistics are improved using localization maps obtained from a supervised learning algorithm. The impact of the localization mappings is assessed in perfect-model observing system simulation experiments (OSSEs) as well as in the presence of model errors resulting from the misspecification of key convective closure parameters. In perfect-model OSSEs, the localization mappings that use adjacent correlations to improve the correlation estimated from small ensemble sizes produce robust accurate analysis estimates. In the presence of model error, the filter skills of the localization maps trained on perfect- and imperfect-model data are comparable.

Download Full-text

The IPTEACES E-Learning Framework: Success Indicators, the Impact on Student Social Demographic Characteristics and the Assessment of Effectiveness

Towards Learning and Instruction in Web 3.0 ◽

10.1007/978-1-4614-1539-8_10 ◽

2011 ◽

pp. 151-169 ◽

Cited By ~ 1

Author(s):

Nuno Pena ◽

Pedro Isaias

Keyword(s):

Demographic Characteristics ◽

Learning Framework ◽

Success Indicators ◽

E Learning ◽

Social Demographic ◽

The Impact

Download Full-text

Research on the Design of Government Affairs Platform in the Context of Big Data

Scientific Programming ◽

10.1155/2021/9936217 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Qian Huang ◽

Xue Wen Li

Keyword(s):

Big Data ◽

Deep Learning ◽

Learning Algorithm ◽

The Internet ◽

Chinese Government ◽

Development Status ◽

Deep Learning Algorithm ◽

Government Management ◽

The Government ◽

The Impact

Big data is a massive and diverse form of unstructured data, which needs proper analysis and management. It is another great technological revolution after the Internet, the Internet of Things, and cloud computing. This paper firstly studies the related concepts and basic theories as the origin of research. Secondly, it analyzes in depth the problems and challenges faced by Chinese government management under the impact of big data. Again, we explore the opportunities that big data brings to government management in terms of management efficiency, administrative capacity, and public services and believe that governments should seize opportunities to make changes. Brainlike computing attempts to simulate the structure and information processing process of biological neural network. This paper firstly analyzes the development status of e-government at home and abroad, studies the service-oriented architecture (SOA) and web services technology, deeply studies the e-government and SOA theory, and discusses this based on the development status of e-government in a certain region. Then, the deep learning algorithm is used to construct the monitoring platform to monitor the government behavior in real time, and the deep learning algorithm is used to conduct in-depth mining to analyze the government's intention behavior.

Download Full-text

Machine-learning annotation of human splicing branchpoints

10.1101/094003 ◽

2016 ◽

Cited By ~ 3

Author(s):

Bethany Signal ◽

Brian S Gloss ◽

Marcel E Dinger ◽

Timothy R Mercer

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gene Splicing ◽

Genetic Encoding ◽

Genome Wide ◽

Common Genetic Variants ◽

A Genome ◽

Wide Scale ◽

The Impact ◽

Splicing Patterns

ABSTRACTBackgroundThe branchpoint element is required for the first lariat-forming reaction in splicing. However due to difficulty in experimentally mapping at a genome-wide scale, current catalogues are incomplete.ResultsWe have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A major fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also confirmed all deleterious branchpoint mutations annotated in clinical variant databases, and further identified thousands of clinical and common genetic variants with similar predicted effects.ConclusionsWe propose the broad annotation of branchpoints constitutes a valuable resource for further investigations into the genetic encoding of splicing patterns, and interpreting the impact of common- and disease-causing human genetic variation on gene splicing.

Download Full-text

Gromov-Wasserstein optimal transport to align single-cell multi-omics data

10.1101/2020.04.28.066787 ◽

2020 ◽

Cited By ~ 2

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Björn Sandstede ◽

William Stafford Noble ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Learning Algorithm ◽

State Of The Art ◽

Single Cells ◽

Wasserstein Distance ◽

Cell Alignment ◽

Shared Space ◽

Real World Datasets ◽

Unsupervised Algorithms

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.

Download Full-text

Analyzing the Impact of Climate Factors on GNSS-Derived Displacements by Combining the Extended Helmert Transformation and XGboost Machine Learning Algorithm

Journal of Sensors ◽

10.1155/2021/9926442 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanlin Liu ◽

Linqiang Yang ◽

Linchao Li

Keyword(s):

Machine Learning ◽

Puerto Rico ◽

Reference Frame ◽

Learning Algorithm ◽

Virgin Islands ◽

Machine Learning Algorithm ◽

Climate Factors ◽

Helmert Transformation ◽

The Impact

A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

Download Full-text