Predicting the APT for Cyber Situation Comprehension in 5G-Enabled IoT Scenarios Based on Differentially Private Federated Learning

Security and Communication Networks ◽

10.1155/2021/8814068 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Xiang Cheng ◽

Qian Luo ◽

Ye Pan ◽

Zitong Li ◽

Jiale Zhang ◽

...

Keyword(s):

Prediction Method ◽

Training Data ◽

Communication Overhead ◽

Growth Trend ◽

Advanced Persistent Threat ◽

Privacy Leakage ◽

Private Data ◽

Maximum Protection ◽

Correlation Rules ◽

Iot Devices

Driven by the advancements in 5G-enabled Internet of Things (IoT) technologies, the IoT devices have shown an explosive growth trend with massive data generated at the edge of the network. However, IoT systems exhibit inherent vulnerability for diverse attacks, and Advanced Persistent Threat (APT) is one of the most powerful attack models that could lead to a significant privacy leakage of systems. Moreover, recent detection technologies can hardly meet the demands of effective security defense against APTs. To address the above problems, we propose an APT Prediction Method based on Differentially Private Federated Learning (APTPMFL) to predict the probability of subsequent APT attacks occurring in IoT systems. It is the first time to apply a federated learning mechanism for aggregating suspicious activities in the IoT systems, where the APT prediction phase does not need any correlation rules. Moreover, to achieve privacy-preserving property, we further adopt a differentially private data perturbation mechanism to add the Laplacian random noises to the IoT device training data features, so as to achieve the maximum protection of privacy data. We also present a 5G-enabled edge computing-based framework to train and deploy the model, which can alleviate the computing and communication overhead of the typical IoT systems. Our evaluation results show that APTPMFL can efficiently predict subsequent APT behaviors in the IoT system accurately and efficiently.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

Transparent CoAP Services to IoT Endpoints through ICN Operator Networks

Sensors ◽

10.3390/s19061339 ◽

2019 ◽

Vol 19 (6) ◽

pp. 1339 ◽

Cited By ~ 2

Author(s):

Hasan Islam ◽

Dmitrij Lagutin ◽

Antti Ylä-Jääski ◽

Nikos Fotiou ◽

Andrei Gurtov

Keyword(s):

Storage Capacity ◽

Communication Overhead ◽

Full Potential ◽

Core Network ◽

Computation Complexity ◽

State Management ◽

Iot Devices ◽

Novel Applications ◽

And Storage ◽

The Internet Of Things

The Constrained Application Protocol (CoAP) is a specialized web transfer protocol which is intended to be used for constrained networks and devices. CoAP and its extensions (e.g., CoAP observe and group communication) provide the potential for developing novel applications in the Internet-of-Things (IoT). However, a full-fledged CoAP-based application may require significant computing capability, power, and storage capacity in IoT devices. To address these challenges, we present the design, implementation, and experimentation with the CoAP handler which provides transparent CoAP services through the ICN core network. In addition, we demonstrate how the CoAP traffic over an ICN network can unleash the full potential of the CoAP, shifting both overhead and complexity from the (constrained) endpoints to the ICN network. The experiments prove that the CoAP Handler helps to decrease the required computation complexity, communication overhead, and state management of the CoAP server.

Download Full-text

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

10.1101/2022.01.14.476325 ◽

2022 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Training Data ◽

Large Set ◽

Theoretic Approach ◽

Machine Learning Model ◽

Protein Functions

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero

Download Full-text

Detection of the Hardcoded Login Information from Socket and String Compare Symbols

Annals of Emerging Technologies in Computing ◽

10.33166/aetic.2021.01.003 ◽

2021 ◽

Vol 5 (1) ◽

pp. 28-39

Author(s):

Minami Yoda ◽

Shuji Sakuraba ◽

Yuichi Sei ◽

Yasuyuki Tahara ◽

Akihiko Ohsuga

Keyword(s):

Internet Of Things ◽

Static Analysis ◽

Real World ◽

Symbolic Execution ◽

The Internet ◽

User Input ◽

Network Function ◽

Private Data ◽

String Search ◽

Iot Devices

Internet of Things (IoT) for smart homes enhances convenience; however, it also introduces the risk of the leakage of private data. TOP10 IoT of OWASP 2018 shows that the first vulnerability is ”Weak, easy to predict, or embedded passwords.” This problem poses a risk because a user can not fix, change, or detect a password if it is embedded in firmware because only the developer of the firmware can control an update. In this study, we propose a lightweight method to detect the hardcoded username and password in IoT devices using a static analysis called Socket Search and String Search to protect from first vulnerability from 2018 OWASP TOP 10 for the IoT device. The hardcoded login information can be obtained by comparing the user input with strcmp or strncmp. Previous studies analyzed the symbols of strcmp or strncmp to detect the hardcoded login information. However, those studies required a lot of time because of the usage of complicated algorithms such as symbolic execution. To develop a lightweight algorithm, we focus on a network function, such as the socket symbol in firmware, because the IoT device is compromised when it is invaded by someone via the Internet. We propose two methods to detect the hardcoded login information: string search and socket search. In string search, the algorithm finds a function that uses the strcmp or strncmp symbol. In socket search, the algorithm finds a function that is referenced by the socket symbol. In this experiment, we measured the ability of our proposed method by searching six firmware in the real world that has a backdoor. We ran three methods: string search, socket search, and whole search to compare the two methods. As a result, all methods found login information from five of six firmware and one unexpected password. Our method reduces the analysis time. The whole search generally takes 38 mins to complete, but our methods finish the search in 4-6 min.

Download Full-text

Holdout-Based Empirical Assessment of Mixed-Type Synthetic Data

Frontiers in Big Data ◽

10.3389/fdata.2021.679939 ◽

2021 ◽

Vol 4 ◽

Author(s):

Michael Platzer ◽

Thomas Reutterer

Keyword(s):

Mixed Type ◽

Synthetic Data ◽

Training Data ◽

Privacy Risk ◽

Individual Level ◽

Empirical Assessment ◽

Model Free ◽

Private Data ◽

Synthetic Datasets ◽

The Individual

AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, which provide a model-free and easy-to-communicate empirical metric for the representativeness of a synthetic dataset. Privacy risk is assessed by calculating the individual-level distances to closest record with respect to the training data. By showing that the synthetic samples are just as close to the training as to the holdout data, we yield strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records. We empirically demonstrate the presented framework for seven distinct synthetic data solutions across four mixed-type datasets and compare these then to traditional data perturbation techniques. Both a Python-based implementation of the proposed metrics and the demonstration study setup is made available open-source. The results highlight the need to systematically assess the fidelity just as well as the privacy of these emerging class of synthetic data generators.

Download Full-text

Adaptive Optimization-Enabled Neural Networks to Handle the Imbalance Churn Data in Churn Prediction

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500255 ◽

2021 ◽

Author(s):

Bharathi Garimella ◽

G. V. S. N. R. V. Prasad ◽

M. H. M. Krishna Prasad

Keyword(s):

Prediction Method ◽

Training Data ◽

Jaccard Coefficient ◽

Master Node ◽

Churn Prediction ◽

Dice Coefficient ◽

Adaptive Optimization ◽

Firefly Optimization ◽

Spark Framework ◽

Firefly Optimization Algorithm

The churn prediction based on telecom data has been paid great attention because of the increasing the number telecom providers, but due to inconsistent data, sparsity, and hugeness, the churn prediction becomes complicated and challenging. Hence, an effective and optimal prediction of churns mechanism, named adaptive firefly-spider optimization (adaptive FSO) algorithm, is proposed in this research to predict the churns using the telecom data. The proposed churn prediction method uses telecom data, which is the trending domain of research in predicting the churns; hence, the classification accuracy is increased. However, the proposed adaptive FSO algorithm is designed by integrating the spider monkey optimization (SMO), firefly optimization algorithm (FA), and the adaptive concept. The input data is initially given to the master node of the spark framework. The feature selection is carried out using Kendall’s correlation to select the appropriate features for further processing. Then, the selected unique features are given to the master node to perform churn prediction. Here, the churn prediction is made using a deep convolutional neural network (DCNN), which is trained by the proposed adaptive FSO algorithm. Moreover, the developed model obtained better performance using the metrics, like dice coefficient, accuracy, and Jaccard coefficient by varying the training data percentage and selected features. Thus, the proposed adaptive FSO-based DCNN showed improved results with a dice coefficient of 99.76%, accuracy of 98.65%, Jaccard coefficient of 99.52%.

Download Full-text

Generation and Prediction of Construction and Demolition Waste Using Exponential Smoothing Method: A Case Study of Shandong Province, China

Sustainability ◽

10.3390/su12125094 ◽

2020 ◽

Vol 12 (12) ◽

pp. 5094 ◽

Cited By ~ 1

Author(s):

Liang Qiao ◽

Doudou Liu ◽

Xueliang Yuan ◽

Qingsong Wang ◽

Qiao Ma

Keyword(s):

Average Rate ◽

Estimation Method ◽

Prediction Method ◽

Exponential Smoothing ◽

Shandong Province ◽

Construction And Demolition Waste ◽

Trend Test ◽

Growth Trend ◽

Demolition Waste ◽

Current Output

The output of construction and demolition (C&D) waste in China has been rapidly increasing in the past decades. The direct landfill of such construction and demolition waste without any treatment accounts for about 98%. Therefore, recycling and utilizing this waste is necessary. The prediction of the output of such waste is the basis for waste disposal and resource utilization. This study takes Shandong Province as a case, the current output of C&D waste is analyzed by building area estimation method, and the output of C&D waste in the next few years is also predicted by Mann–Kendall trend test and quadratic exponential smoothing prediction method. Results indicate that the annual productions of C&D waste in Shandong Province demonstrates a significant growth trend with average annual growth of 11.38%. The growth rates of each city differ a lot. The better the city’s economic development, the higher the level of urbanization, the more C&D waste generated. The prediction results suggest that the output of C&D waste in Shandong Province will grow at an average rate of 3.07% in the next few years. By 2025, the amount of C&D waste will reach 141 million tons. These findings can provide basic data support and reference for the management and utilization of C&D waste.

Download Full-text

Limiting Privacy Breaches in Average-Distance Query

Security and Communication Networks ◽

10.1155/2020/8895281 ◽

2020 ◽

Vol 2020 ◽

pp. 1-24

Author(s):

Huihua Xia ◽

Yan Xiong ◽

Wenchao Huang ◽

Zhaoyi Meng ◽

Fuyou Miao

Keyword(s):

Service Providers ◽

Average Distance ◽

Real Life ◽

Privacy Preserving ◽

Business Decision ◽

Privacy Leakage ◽

Private Data ◽

New Type ◽

Different Dimensions ◽

Privacy Breaches

Querying average distances is useful for real-world applications such as business decision and medical diagnosis, as it can help a decision maker to better understand the users’ data in a database. However, privacy has been an increasing concern. People are now suffering serious privacy leakage from various kinds of sources, especially service providers who provide insufficient protection on user’s private data. In this paper, we discover a new type of attack in an average-distance query (AVGD query) with noisy results. The attack is general that it can be used to reveal private data of different dimensions. We theoretically analyze how different factors affect the accuracy of the attack and propose the privacy-preserving mechanism based on the analysis. We experiment on two real-life datasets to show the feasibility and severity of the attack. The results show that the severity of the attack is mainly influenced by the factors including the noise magnitude, the number of queries, and the number of users in each query. Also, we validate the correctness of our theoretical analysis by comparing with the experimental results and confirm the effectiveness of the privacy-preserving mechanism.

Download Full-text

A New Prediction Method for Chaotic Time Series

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1673 ◽

2013 ◽

Vol 380-384 ◽

pp. 1673-1676

Author(s):

Juan Du

Keyword(s):

Neural Network ◽

Time Series ◽

Prediction Accuracy ◽

Time Series Prediction ◽

Prediction Method ◽

Cumulative Effect ◽

Training Data ◽

Training Algorithm ◽

Learning Speed ◽

Sunspot Data

In order to show the time cumulative effect in the process for the time series prediction, the process neural network is taken. The training algorithm of modified particle swarm is used to the model for the learning speed. The training data is sunspot data from 1700 to 2007. Simulation result shows that the prediction model and algorithm has faster training speed and prediction accuracy than the artificial neural network.

Download Full-text

Peramalan Jumlah Penumpang Kereta Api di Indonesia dengan Resilient Back-Propagation (Rprop) Neural Network

Jurnal Matematika MANTIK ◽

10.15642/mantik.2018.4.2.90-99 ◽

2018 ◽

Vol 4 (2) ◽

pp. 90-99

Author(s):

Mertha Endah Ervina ◽

Rini Silvi ◽

Intaniah Ratna Nur Wisisono

Keyword(s):

Neural Network ◽

Back Propagation ◽

Prediction Method ◽

Back Propagation Neural Network ◽

Training Data ◽

Percentage Error ◽

Network Architectures ◽

Forecasting Accuracy ◽

Resilient Back Propagation ◽

Very High

Train scheduling affects the level of customer satisfaction and profitability of the train service provider. The prediction method of Back-propagation Neural Network (BPNN) has relatively slow convergence. Therefore, this study uses Resilient Back-propagation (Rprop) because it has a more fast convergence and high accuracy. The model produced is a model for Jabodetabek, Java (non-Jabodetabek), Sumatra, and Indonesia. From the results of data analysis conducted, it can be concluded that the performance of neural network model with Resilient Back-propagation (Rprop) formed from training data gives very accurate prediction accuracy level with mean absolute percentage error (MAPE) less than 10% for each model. Then forecasting for the next 12 months conducted and the results compared with the data testing, Rprop provides a very high forecasting accuracy with MAPE value below 10%. The MAPE value for each forecasting the number of rail passengers is 7.50% for Jabodetabek, 5.89% for Java (non-Jabodetabek), 5.36% for Sumatra and 4.80% for Indonesia. That is, four neural network architectures with Rprop can be used for this case with very accurate forecasting results.

Download Full-text