Prevention of Crypto-Ransomware Using a Pre-Encryption Detection Algorithm

S. Kok; Azween Abdullah; NZ Jhanjhi; Mahadevan Supramaniam

doi:10.3390/computers8040079

Prevention of Crypto-Ransomware Using a Pre-Encryption Detection Algorithm

Computers ◽

10.3390/computers8040079 ◽

2019 ◽

Vol 8 (4) ◽

pp. 79 ◽

Cited By ~ 7

Author(s):

S. Kok ◽

Azween Abdullah ◽

NZ Jhanjhi ◽

Mahadevan Supramaniam

Keyword(s):

Machine Learning ◽

Phase I ◽

Phase Ii ◽

Learning Algorithm ◽

False Positive Rate ◽

Application Programming Interface ◽

Detection Algorithm ◽

Positive Rate ◽

New Type ◽

Two Phases

Ransomware is a relatively new type of intrusion attack, and is made with the objective of extorting a ransom from its victim. There are several types of ransomware attacks, but the present paper focuses only upon the crypto-ransomware, because it makes data unrecoverable once the victim’s files have been encrypted. Therefore, in this research, it was proposed that machine learning is used to detect crypto-ransomware before it starts its encryption function, or at the pre-encryption stage. Successful detection at this stage is crucial to enable the attack to be stopped from achieving its objective. Once the victim was aware of the presence of crypto-ransomware, valuable data and files can be backed up to another location, and then an attempt can be made to clean the ransomware with minimum risk. Therefore we proposed a pre-encryption detection algorithm (PEDA) that consisted of two phases. In, PEDA-Phase-I, a Windows application programming interface (API) generated by a suspicious program would be captured and analyzed using the learning algorithm (LA). The LA can determine whether the suspicious program was a crypto-ransomware or not, through API pattern recognition. This approach was used to ensure the most comprehensive detection of both known and unknown crypto-ransomware, but it may have a high false positive rate (FPR). If the prediction was a crypto-ransomware, PEDA would generate a signature of the suspicious program, and store it in the signature repository, which was in Phase-II. In PEDA-Phase-II, the signature repository allows the detection of crypto-ransomware at a much earlier stage, which was at the pre-execution stage through the signature matching method. This method can only detect known crypto-ransomware, and although very rigid, it was accurate and fast. The two phases in PEDA formed two layers of early detection for crypto-ransomware to ensure zero files lost to the user. However in this research, we focused upon Phase-I, which was the LA. Based on our results, the LA had the lowest FPR of 1.56% compared to Naive Bayes (NB), Random Forest (RF), Ensemble (NB and RF) and EldeRan (a machine learning approach to analyze and classify ransomware). Low FPR indicates that LA has a low probability of predicting goodware wrongly.

Download Full-text

TGANs with Machine Learning Models in Automobile Insurance Fraud Detection and Comparative Study with Other Data Imbalance Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e5277.019521 ◽

2021 ◽

Vol 9 (5) ◽

pp. 236-244

Author(s):

Rohan Yashraj Gupta ◽

Satya Sai Mudigonda ◽

Pallav Kumar Baruah

Keyword(s):

Machine Learning ◽

Phase I ◽

Phase Ii ◽

False Positive Rate ◽

Fraud Detection ◽

Insurance Fraud ◽

Learning Models ◽

Data Imbalance ◽

Positive Rate ◽

Machine Learning Models

A data-driven Fraud detection model for insurance business can be seen as a two-phase method. Phase I is data-preprocessing of a given dataset, in which, handling class imbalance is a major challenge. Phase II is that of classification using Machine Learning models. It is important to comprehend if there is any influence of the technique used in Phase I on the efficiency of the model used for Phase II. A natural query that intrigues one is whether there is a golden combination of a technique in Phase I and a specific model in Phase II for assured best performance of a Fraud Detection Model.In this work, we study a few techniques for handling data imbalance issue namely, SMOTE, MWMOTE, ADASYN and TGAN in combination with various classifier models like Random Forest (RF), Decision Trees (DT), Support Vector Machines (SVM), LightGBM, XGBoost and Gradient Boosting Machines (GBM). The study is conducted on a dataset for motor vehicle insurance fraud detection.We present a comparison of various combinations of data imbalance technique and classifier models. It is observed that the combination of TGAN in Phase I and GBM in Phase II gives the best performance. This combination performs best in terms of important metrics such as false positive rate, precision and specificity. We obtained the lowest false positive rate of 0.0011 and precision of 0.9988 which minimizes the most critical risk for the insurance company of falsely classifying a non-fraud claim as a fraud. Finally, the specificity of 0.9989 indicates that the model was also very good at predicting the non-fraudulent claim.

Download Full-text

A novel machine learning algorithm has the potential to reduce by 1/3 the quantity of ILR episodes needing review

European Heart Journal ◽

10.1093/eurheartj/ehab724.0316 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

A Rosier ◽

E Crespin ◽

A Lazarus ◽

G Laurent ◽

A Menet ◽

...

Keyword(s):

Machine Learning ◽

False Positive ◽

Learning Algorithm ◽

False Positive Rate ◽

Machine Learning Algorithm ◽

The Novel ◽

Funding Sources ◽

High Workload ◽

Positive Rate ◽

The Impact

Abstract Background Implantable Loop Recorders (ILRs) are increasingly used and generate a high workload for timely adjudication of ECG recordings. In particular, the excessive false positive rate leads to a significant review burden. Purpose A novel machine learning algorithm was developed to reclassify ILR episodes in order to decrease by 80% the False Positive rate while maintaining 99% sensitivity. This study aims to evaluate the impact of this algorithm to reduce the number of abnormal episodes reported in Medtronic ILRs. Methods Among 20 European centers, all Medtronic ILR patients were enrolled during the 2nd semester of 2020. Using a remote monitoring platform, every ILR transmitted episode was collected and anonymised. For every ILR detected episode with a transmitted ECG, the new algorithm reclassified it applying the same labels as the ILR (asystole, brady, AT/AF, VT, artifact, normal). We measured the number of episodes identified as false positive and reclassified as normal by the algorithm, and their proportion among all episodes. Results In 370 patients, ILRs recorded 3755 episodes including 305 patient-triggered and 629 with no ECG transmitted. 2821 episodes were analyzed by the novel algorithm, which reclassified 1227 episodes as normal rhythm. These reclassified episodes accounted for 43% of analyzed episodes and 32.6% of all episodes recorded. Conclusion A novel machine learning algorithm significantly reduces the quantity of episodes flagged as abnormal and typically reviewed by healthcare professionals. FUNDunding Acknowledgement Type of funding sources: None. Figure 1. ILR episodes analysis

Download Full-text

LC-MS Peak Assignment Based on Unanimous Selection by Six Machine Learning Algorithms

10.21203/rs.3.rs-845859/v1 ◽

2021 ◽

Author(s):

Hiroaki Ito ◽

Takashi Matsui ◽

Ryo Konno ◽

Makoto Itakura ◽

Yoshio Kodera

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Weak Signals ◽

Accuracy And Precision ◽

Peak Assignment ◽

Positive Rate ◽

Assignment Strategy

Abstract Recent Mass spectrometry (MS)-based techniques enable deep proteome coverage with relative quantitative analysis, resulting in increased identification of very weak signals accompanied by increased data size of liquid chromatography (LC)–MS/MS spectra. However, the identification of weak signals using an assignment strategy with poorer performance resulted in imperfect quantification with misidentification of peaks and ratio distortions. Manually annotating a large number of signals within a very large dataset is not a realistic approach. In this study, therefore, we utilized machine learning algorithms to successfully extract a higher number of peptide peaks with high accuracy and precision. Our strategy evaluated each peak identified using six different algorithms; peptide peaks identified by all six algorithms (i.e., unanimously selected) were subsequently assigned as true peaks, which resulted in a reduction in the false-positive rate. Hence, exact and highly quantitative peptide peaks were obtained, providing better performance than obtained applying the conventional criteria or using a single machine learning algorithm.

Download Full-text

Machine learning analysis of pregnancy data enables early identification of a subpopulation of newborns with ASD

Scientific Reports ◽

10.1038/s41598-021-86320-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hugues Caly ◽

Hamed Rabiei ◽

Perrine Coste-Mazeau ◽

Sebastien Hantz ◽

Sophie Alain ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Behavioral Interventions ◽

Learning Algorithm ◽

Statistical Tests ◽

False Positive Rate ◽

Supervised Machine Learning ◽

Fetal Head ◽

Positive Rate ◽

History Of

AbstractTo identify newborns at risk of developing ASD and to detect ASD biomarkers early after birth, we compared retrospectively ultrasound and biological measurements of babies diagnosed later with ASD or neurotypical (NT) that are collected routinely during pregnancy and birth. We used a supervised machine learning algorithm with a cross-validation technique to classify NT and ASD babies and performed various statistical tests. With a minimization of the false positive rate, 96% of NT and 41% of ASD babies were identified with a positive predictive value of 77%. We identified the following biomarkers related to ASD: sex, maternal familial history of auto-immune diseases, maternal immunization to CMV, IgG CMV level, timing of fetal rotation on head, femur length in the 3rd trimester, white blood cell count in the 3rd trimester, fetal heart rate during labor, newborn feeding and temperature difference between birth and one day after. Furthermore, statistical models revealed that a subpopulation of 38% of babies at risk of ASD had significantly larger fetal head circumference than age-matched NT ones, suggesting an in utero origin of the reported bigger brains of toddlers with ASD. Our results suggest that pregnancy follow-up measurements might provide an early prognosis of ASD enabling pre-symptomatic behavioral interventions to attenuate efficiently ASD developmental sequels.

Download Full-text

LC–MS peak assignment based on unanimous selection by six machine learning algorithms

Scientific Reports ◽

10.1038/s41598-021-02899-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hiroaki Ito ◽

Takashi Matsui ◽

Ryo Konno ◽

Makoto Itakura ◽

Yoshio Kodera

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Weak Signals ◽

Accuracy And Precision ◽

Positive Rate ◽

Assignment Strategy ◽

Performance Results

AbstractRecent mass spectrometry (MS)-based techniques enable deep proteome coverage with relative quantitative analysis, resulting in increased identification of very weak signals accompanied by increased data size of liquid chromatography (LC)–MS/MS spectra. However, the identification of weak signals using an assignment strategy with poorer performance results in imperfect quantification with misidentification of peaks and ratio distortions. Manually annotating a large number of signals within a very large dataset is not a realistic approach. In this study, therefore, we utilized machine learning algorithms to successfully extract a higher number of peptide peaks with high accuracy and precision. Our strategy evaluated each peak identified using six different algorithms; peptide peaks identified by all six algorithms (i.e., unanimously selected) were subsequently assigned as true peaks, which resulted in a reduction in the false-positive rate. Hence, exact and highly quantitative peptide peaks were obtained, providing better performance than obtained applying the conventional criteria or using a single machine learning algorithm.

Download Full-text

Comparison of malware detection techniques using machine learning algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i1.pp435-440 ◽

2019 ◽

Vol 16 (1) ◽

pp. 435 ◽

Cited By ~ 1

Author(s):

Nur Syuhada Selamat ◽

Fakariah Hani Mohd Ali

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Malware Detection ◽

Virus Detection ◽

Detection Accuracy ◽

Decision Tree Algorithm ◽

Security Threat ◽

Detection Techniques ◽

Positive Rate

<p>Currently, the volume of malware grows faster each year and poses a thoughtful global security threat. The number of malware developed increases as computers became interconnected, at an alarming rate in the 1990s. This scenario resulted the increment of malware. It also caused many protections are built to fight the malware. Unfortunately, the current technology is no longer effective to handle more advanced malware. Malware authors have created them to become more difficult to be evaded from anti-virus detection. In the current research, Machine Learning (ML) algorithm techniques became more popular to the researchers to analyze malware detection. In this paper, researchers proposed a defense system which uses three ML algorithm techniques comparison and select them based on the high accuracy malware detection. The result indicates that Decision Tree algorithm is the best detection accuracy compares to others classifier with 99% and 0.021% False Positive Rate (FPR) on a relatively small dataset.</p>

Download Full-text

IMPROVED TOW-END DETECTION FOR FIBER PLACEMENT INSPECTION USING MACHINE LEARNING

10.12783/asc36/35826 ◽

2021 ◽

Author(s):

ADRIANA W. (AGNES) BLOM-SCHIEBER ◽

WEI GUO ◽

EKTA SAMANI ◽

ASHIS BANERJEE

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Image Data ◽

Detection Algorithm ◽

Automated Inspection ◽

Machine Learning Algorithm ◽

Fiber Placement ◽

Inspection Systems ◽

Automated Fiber Placement ◽

The University

A machine learning approach to improve the detection of tow ends for automated inspection of fiber-placed composites is presented. Automated inspection systems for automated fiber placement processes have been introduced to reduce the time it takes to inspect plies after they are laid down. The existing system uses image data from ply boundaries and a contrast-based algorithm to locate the tow ends in these images. This system fails to recognize approximately 10% of the tow ends, which are then presented to the operator for manual review, taking up precious time in the production process. An improved tow end detection algorithm based on machine learning is developed through a research project with the Boeing Advanced Research Center (BARC) at the University of Washington. This presentation shows the preprocessing, neural network and post‐processing steps implemented in the algorithm, and the results achieved with the machine learning algorithm. The machine learning algorithm resulted in a 90% reduction in the number of undetected tows compared to the existing system.

Download Full-text

ScoreNet: A neural network-based post-processing model for identifying epileptic seizure onset and offset in EEGs

10.1101/2020.12.21.423728 ◽

2020 ◽

Author(s):

Poomipat Boonyakitanont ◽

Apiwat Lek-uthai ◽

Jitkomut Songsiri

Keyword(s):

Neural Network ◽

Epileptic Seizure ◽

False Positive Rate ◽

Imbalanced Data ◽

Detection Algorithm ◽

Processing Technique ◽

Post Processing ◽

Scalp Eeg ◽

Positive Rate ◽

Eeg Recordings

AbstractThis article aims to design an automatic detection algorithm of epileptic seizure onsets and offsets in scalp EEGs. A proposed scheme consists of two sequential steps: the detection of seizure episodes, and the determination of seizure onsets and offsets in long EEG recordings. We introduce a neural network-based model called ScoreNet as a post-processing technique to determine the seizure onsets and offsets in EEGs. A cost function called a log-dice loss that has an analogous meaning to F1 is proposed to handle an imbalanced data problem. In combination with several classifiers including random forest, CNN, and logistic regression, the ScoreNet is then verified on the CHB-MIT Scalp EEG database. As a result, in seizure detection, the ScoreNet can significantly improve F1 to 70.15% and can considerably reduce false positive rate per hour to 0.05 on average. In addition, we propose detection delay metric, an effective latency index as a summation of the exponential of delays, that includes undetected events into account. The index can provide a better insight into onset and offset detection than conventional time-based metrics.

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

10.1101/19008367 ◽

2019 ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Positive Predictive Value ◽

Electronic Medical Records ◽

Predictive Value ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Positive Rate

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.

Download Full-text

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Electronics ◽

10.3390/electronics10222857 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2857

Author(s):

Laura Vigoya ◽

Diego Fernandez ◽

Victor Carneiro ◽

Francisco Nóvoa

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

False Positive Rate ◽

Machine Learning Techniques ◽

Support Vector ◽

High Detection Rate ◽

Security Vulnerabilities ◽

Smart Systems ◽

Learning Techniques ◽

Positive Rate

With advancements in engineering and science, the application of smart systems is increasing, generating a faster growth of the IoT network traffic. The limitations due to IoT restricted power and computing devices also raise concerns about security vulnerabilities. Machine learning-based techniques have recently gained credibility in a successful application for the detection of network anomalies, including IoT networks. However, machine learning techniques cannot work without representative data. Given the scarcity of IoT datasets, the DAD emerged as an instrument for knowing the behavior of dedicated IoT-MQTT networks. This paper aims to validate the DAD dataset by applying Logistic Regression, Naive Bayes, Random Forest, AdaBoost, and Support Vector Machine to detect traffic anomalies in IoT. To obtain the best results, techniques for handling unbalanced data, feature selection, and grid search for hyperparameter optimization have been used. The experimental results show that the proposed dataset can achieve a high detection rate in all the experiments, providing the best mean accuracy of 0.99 for the tree-based models, with a low false-positive rate, ensuring effective anomaly detection.

Download Full-text