scholarly journals An Automatic Source Code Vulnerability Detection Approach Based on KELM

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Gaigai Tang ◽  
Lin Yang ◽  
Shuangyin Ren ◽  
Lianxiao Meng ◽  
Feng Yang ◽  
...  

Traditional vulnerability detection mostly ran on rules or source code similarity with manually defined vulnerability features. In fact, these vulnerability rules or features are difficult to be defined accurately, which usually cost much expert labor and perform weakly in practical applications. To mitigate this issue, researchers introduced neural networks to automatically extract features to improve the intelligence of vulnerability detection. Bidirectional Long Short-term Memory (Bi-LSTM) network has proved a success for software vulnerability detection. However, due to complex context information processing and iterative training mechanism, training cost is heavy for Bi-LSTM. To effectively improve the training efficiency, we proposed to use Extreme Learning Machine (ELM). The training process of ELM is noniterative, so the network training can converge quickly. As ELM usually shows weak precision performance because of its simple network structure, we introduce the kernel method. In the preprocessing of this framework, we introduce doc2vec for vector representation and multilevel symbolization for program symbolization. Experimental results show that doc2vec vector representation brings faster training and better generalizing performance than word2vec. ELM converges much quickly than Bi-LSTM, and the kernel method can effectively improve the precision of ELM while ensuring training efficiency.

Algorithms ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 335
Author(s):  
Hongwei Wei ◽  
Guanjun Lin ◽  
Lin Li ◽  
Heming Jia

Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vulnerable. We argue that a code segment is vulnerable if it exists in certain semantic contexts, such as the control flow and data flow; therefore, it is important for the detection to be context aware. In this paper, we evaluate the performance of mainstream word embedding techniques in the scenario of software vulnerability detection. Based on the evaluation, we propose a supervised framework leveraging pre-trained context-aware embeddings from language models (ELMo) to capture deep contextual representations, further summarized by a bidirectional long short-term memory (Bi-LSTM) layer for learning long-range code dependency. The framework takes directly a source code function as an input and produces corresponding function embeddings, which can be treated as feature sets for conventional ML classifiers. Experimental results showed that the proposed framework yielded the best performance in its downstream detection tasks. Using the feature representations generated by our framework, random forest and support vector machine outperformed four baseline systems on our data sets, demonstrating that the framework incorporated with ELMo can effectively capture the vulnerable data flow patterns and facilitate the vulnerability detection task.


2021 ◽  
Vol 3 (2(59)) ◽  
pp. 19-23
Author(s):  
Yevhenii Kubiuk ◽  
Gennadiy Kyselov

The object of research of this work is the methods of deep learning for source code vulnerability detection. One of the most problematic areas is the use of only one approach in the code analysis process: the approach based on the AST (abstract syntax tree) or the approach based on the program dependence graph (PDG). In this paper, a comparative analysis of two approaches for source code vulnerability detection was conducted: approaches based on AST and approaches based on the PDG. In this paper, various topologies of neural networks were analyzed. They are used in approaches based on the AST and PDG. As the result of the comparison, the advantages and disadvantages of each approach were determined, and the results were summarized in the corresponding comparison tables. As a result of the analysis, it was determined that the use of BLSTM (Bidirectional Long Short Term Memory) and BGRU (Bidirectional Gated Linear Unit) gives the best result in terms of problems of source code vulnerability detection. As the analysis showed, the most effective approach for source code vulnerability detection systems is a method that uses an intermediate representation of the code, which allows getting a language-independent tool. Also, in this work, our own algorithm for the source code analysis system is proposed, which is able to perform the following operations: predict the source code vulnerability, classify the source code vulnerability, and generate a corresponding patch for the found vulnerability. A detailed analysis of the proposed system’s unresolved issues is provided, which is planned to investigate in future researches. The proposed system could help speed up the software development process as well as reduce the number of software code vulnerabilities. Software developers, as well as specialists in the field of cybersecurity, can be stakeholders of the proposed system.


2021 ◽  
Vol 13 (12) ◽  
pp. 6953
Author(s):  
Yixing Du ◽  
Zhijian Hu

Data-driven methods using synchrophasor measurements have a broad application prospect in Transient Stability Assessment (TSA). Most previous studies only focused on predicting whether the power system is stable or not after disturbance, which lacked a quantitative analysis of the risk of transient stability. Therefore, this paper proposes a two-stage power system TSA method based on snapshot ensemble long short-term memory (LSTM) network. This method can efficiently build an ensemble model through a single training process, and employ the disturbed trajectory measurements as the inputs, which can realize rapid end-to-end TSA. In the first stage, dynamic hierarchical assessment is carried out through the classifier, so as to screen out credible samples step by step. In the second stage, the regressor is used to predict the transient stability margin of the credible stable samples and the undetermined samples, and combined with the built risk function to realize the risk quantification of transient angle stability. Furthermore, by modifying the loss function of the model, it effectively overcomes sample imbalance and overlapping. The simulation results show that the proposed method can not only accurately predict binary information representing transient stability status of samples, but also reasonably reflect the transient safety risk level of power systems, providing reliable reference for the subsequent control.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1181
Author(s):  
Chenhao Zhu ◽  
Sheng Cai ◽  
Yifan Yang ◽  
Wei Xu ◽  
Honghai Shen ◽  
...  

In applications such as carrier attitude control and mobile device navigation, a micro-electro-mechanical-system (MEMS) gyroscope will inevitably be affected by random vibration, which significantly affects the performance of the MEMS gyroscope. In order to solve the degradation of MEMS gyroscope performance in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter (KF) is proposed for error compensation, where Kalman filter parameters are iteratively optimized using the Kalman smoother and expectation-maximization (EM) algorithm. In order to verify the effectiveness of the proposed method, we performed a linear random vibration test to acquire MEMS gyroscope data. Subsequently, an analysis of the effects of input data step size and network topology on gyroscope error compensation performance is presented. Furthermore, the autoregressive moving average-Kalman filter (ARMA-KF) model, which is commonly used in gyroscope error compensation, was also combined with the LSTM network as a comparison method. The results show that, for the x-axis data, the proposed combined method reduces the standard deviation (STD) by 51.58% and 31.92% compared to the bidirectional LSTM (BiLSTM) network, and EM-KF method, respectively. For the z-axis data, the proposed combined method reduces the standard deviation by 29.19% and 12.75% compared to the BiLSTM network and EM-KF method, respectively. Furthermore, for x-axis data and z-axis data, the proposed combined method reduces the standard deviation by 46.54% and 22.30% compared to the BiLSTM-ARMA-KF method, respectively, and the output is smoother, proving the effectiveness of the proposed method.


2021 ◽  
Vol 2 (2) ◽  
Author(s):  
Kate Highnam ◽  
Domenic Puzio ◽  
Song Luo ◽  
Nicholas R. Jennings

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.


Author(s):  
Zhang Chao ◽  
Wang Wei-zhi ◽  
Zhang Chen ◽  
Fan Bin ◽  
Wang Jian-guo ◽  
...  

Accurate and reliable fault diagnosis is one of the key and difficult issues in mechanical condition monitoring. In recent years, Convolutional Neural Network (CNN) has been widely used in mechanical condition monitoring, which is also a great breakthrough in the field of bearing fault diagnosis. However, CNN can only extract local features of signals. The model accuracy and generalization of the original vibration signals are very low in the process of vibration signal processing only by CNN. Based on the above problems, this paper improves the traditional convolution layer of CNN, and builds the learning module (local feature learning block, LFLB) of the local characteristics. At the same time, the Long Short-Term Memory (LSTM) is introduced into the network, which is used to extract the global features. This paper proposes the new neural network—improved CNN-LSTM network. The extracted deep feature is used for fault classification. The improved CNN-LSTM network is applied to the processing of the vibration signal of the faulty bearing collected by the bearing failure laboratory of Inner Mongolia University of science and technology. The results show that the accuracy of the improved CNN-LSTM network on the same batch test set is 98.75%, which is about 24% higher than that of the traditional CNN. The proposed network is applied to the bearing data collection of Western Reserve University under the condition that the network parameters remain unchanged. The experiment shows that the improved CNN-LSTM network has better generalization than the traditional CNN.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 660 ◽  
Author(s):  
Fang Liu ◽  
Liubin Li ◽  
Yongbin Liu ◽  
Zheng Cao ◽  
Hui Yang ◽  
...  

In real industrial applications, bearings in pairs or even more are often mounted on the same shaft. So the collected vibration signal is actually a mixed signal from multiple bearings. In this study, a method based on Hybrid Kernel Function-Support Vector Regression (HKF–SVR) whose parameters are optimized by Krill Herd (KH) algorithm was introduced for bearing performance degradation prediction in this situation. First, multi-domain statistical features are extracted from the bearing vibration signals and then fused into sensitive features using Kernel Joint Approximate Diagonalization of Eigen-matrices (KJADE) algorithm which is developed recently by our group. Due to the nonlinear mapping capability of the kernel method and the blind source separation ability of the JADE algorithm, the KJADE could extract latent source features that accurately reflecting the performance degradation from the mixed vibration signal. Then, the between-class and within-class scatters (SS) of the health-stage data sample and the current monitored data sample is calculated as the performance degradation index. Second, the parameters of the HKF–SVR are optimized by the KH (Krill Herd) algorithm to obtain the optimal performance degradation prediction model. Finally, the performance degradation trend of the bearing is predicted using the optimized HKF–SVR. Compared with the traditional methods of Back Propagation Neural Network (BPNN), Extreme Learning Machine (ELM) and traditional SVR, the results show that the proposed method has a better performance. The proposed method has a good application prospect in life prediction of coaxial bearings.


2021 ◽  
Vol 12 (4) ◽  
pp. 228
Author(s):  
Jianfeng Jiang ◽  
Shaishai Zhao ◽  
Chaolong Zhang

The state-of-health (SOH) estimation is of extreme importance for the performance maximization and upgrading of lithium-ion battery. This paper is concerned with neural-network-enabled battery SOH indication and estimation. The insight that motivates this work is that the chi-square of battery voltages of each constant current-constant voltage phrase and mean temperature could reflect the battery capacity loss effectively. An ensemble algorithm composed of extreme learning machine (ELM) and long short-term memory (LSTM) neural network is utilized to capture the underlying correspondence between the SOH, mean temperature and chi-square of battery voltages. NASA battery data and battery pack data are used to demonstrate the estimation procedures and performance of the proposed approach. The results show that the proposed approach can estimate the battery SOH accurately. Meanwhile, comparative experiments are designed to compare the proposed approach with the separate used method, and the proposed approach shows better estimation performance in the comparisons.


Sign in / Sign up

Export Citation Format

Share Document