The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models

2019 ◽  
Vol 31 (9) ◽  
Author(s):  
Aydin Kaya ◽  
Ali Seydi Keceli ◽  
Cagatay Catal ◽  
Bedir Tekinerdogan
2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Any vulnerability in the software creates a software security threat and helps hackers to gain unauthorized access to resources. Vulnerability prediction models help software engineers to effectively allocate their resources to find any vulnerable class in the software, before its delivery to customers. Vulnerable classes must be carefully reviewed by security experts and tested to identify potential threats that may arise in the future. In the present work, a novel technique based on Grey wolf algorithm and Random forest is proposed for software vulnerability prediction. Grey wolf technique is a metaheuristic technique and it is used to select the best subset of features. The proposed technique is compared with other machine learning techniques. Experiments were performed on three datasets available publicly. It was observed that our proposed technique (GW-RF) outperformed all other techniques for software vulnerability prediction.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 2050
Author(s):  
Beatriz Castro Dias Cuyabano ◽  
Gabriel Rovere ◽  
Dajeong Lim ◽  
Tae Hun Kim ◽  
Hak Kyo Lee ◽  
...  

It is widely known that the environment influences phenotypic expression and that its effects must be accounted for in genetic evaluation programs. The most used method to account for environmental effects is to add herd and contemporary group to the model. Although generally informative, the herd effect treats different farms as independent units. However, if two farms are located physically close to each other, they potentially share correlated environmental factors. We introduce a method to model herd effects that uses the physical distances between farms based on the Global Positioning System (GPS) coordinates as a proxy for the correlation matrix of these effects that aims to account for similarities and differences between farms due to environmental factors. A population of Hanwoo Korean cattle was used to evaluate the impact of modelling herd effects as correlated, in comparison to assuming the farms as completely independent units, on the variance components and genomic prediction. The main result was an increase in the reliabilities of the predicted genomic breeding values compared to reliabilities obtained with traditional models (across four traits evaluated, reliabilities of prediction presented increases that ranged from 0.05 ± 0.01 to 0.33 ± 0.03), suggesting that these models may overestimate heritabilities. Although little to no significant gain was obtained in phenotypic prediction, the increased reliability of the predicted genomic breeding values is of practical relevance for genetic evaluation programs.


2020 ◽  
Vol 28 (4) ◽  
pp. 1413-1446 ◽  
Author(s):  
Patrick Kwaku Kudjo ◽  
Jinfu Chen ◽  
Solomon Mensah ◽  
Richard Amankwah ◽  
Christopher Kudjo

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Neel Patel ◽  
William S. Bush

Abstract Background Transcriptional regulation is complex, requiring multiple cis (local) and trans acting mechanisms working in concert to drive gene expression, with disruption of these processes linked to multiple diseases. Previous computational attempts to understand the influence of regulatory mechanisms on gene expression have used prediction models containing input features derived from cis regulatory factors. However, local chromatin looping and trans-acting mechanisms are known to also influence transcriptional regulation, and their inclusion may improve model accuracy and interpretation. In this study, we create a general model of transcription factor influence on gene expression by incorporating both cis and trans gene regulatory features. Results We describe a computational framework to model gene expression for GM12878 and K562 cell lines. This framework weights the impact of transcription factor-based regulatory data using multi-omics gene regulatory networks to account for both cis and trans acting mechanisms, and measures of the local chromatin context. These prediction models perform significantly better compared to models containing cis-regulatory features alone. Models that additionally integrate long distance chromatin interactions (or chromatin looping) between distal transcription factor binding regions and gene promoters also show improved accuracy. As a demonstration of their utility, effect estimates from these models were used to weight cis-regulatory rare variants for sequence kernel association test analyses of gene expression. Conclusions Our models generate refined effect estimates for the influence of individual transcription factors on gene expression, allowing characterization of their roles across the genome. This work also provides a framework for integrating multiple data types into a single model of transcriptional regulation.


Blood ◽  
2020 ◽  
Author(s):  
Louisa Goumidi ◽  
Florian Thibord ◽  
Kerri L. Wiggins ◽  
Ruifang Li-Gao ◽  
Michael R Brown ◽  
...  

Genetic risk score (GRS) analysis is an increasingly popular approach to derive individual risk prediction models for complex diseases. In the context of venous thrombosis (VT), any GRS shall integrate information at the ABO blood group locus, the latter being one of the major susceptibility locus for this disease. However, there is yet no consensus about which single nucleotide polymorphisms (SNPs) must be investigated when one is interested in properly assessing the association of ABO locus with VT risk. Using comprehensive haplotype analyses of ABO blood group tagging SNPs in up to 5,425 cases and 8,445 controls from 6 studies, we demonstrated that using only rs8176719 (tagging O1) to correctly assess the impact of ABO locus on VT risk is suboptimal as 5% of rs8176719-delG carriers are not exposed at higher VT risk. Instead, we recommend to use 4 SNPs, rs2519093 (tagging A1), rs1053878 (A2), rs8176743 (B) and rs41302905 (O2) in any analysis aimed at assessing the impact of ABO locus on VT risk to avoid any risk misestimation. Compared to O1 haplotype that can be inferred from these 4 SNPs, the A2 haplotype is associated with a modest increase in VT risk (odds ratio ~1.2), A1 and B haplotypes are associated with a ~1.8 fold increased risk while O2 tend to be slightly protective (odds ratio ~0.80). In addition, our analyses clearly showed that while the A1 an B blood group are associated with increased vWF and FVIII plasma levels only the A1 blood group is associated wih ICAM plasma levels but in an opposite direction, leaving additional avenues to be explored in order to fully understand the whole spectrum of biological effect of ABO locus on cardiovascular traits.


Sign in / Sign up

Export Citation Format

Share Document