scholarly journals A New Oversampling Method Based on the Classification Contribution Degree

Symmetry ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 194
Author(s):  
Zhenhao Jiang ◽  
Tingting Pan ◽  
Chao Zhang ◽  
Jie Yang

Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD follows the spatial distribution characteristics of original samples on the class boundary, as well as avoids oversampling from noisy points. Experiments on twelve benchmark datasets demonstrate that OS-CCD outperforms six classical oversampling methods in terms of accuracy, F1-score, AUC, and ROC.

2021 ◽  
Vol 13 (1) ◽  
pp. 796-806
Author(s):  
Zhen Shuo ◽  
Zhang Jingyu ◽  
Zhang Zhengxiang ◽  
Zhao Jianjun

Abstract Understanding the risk of grassland fire occurrence associated with historical fire point events is critical for implementing effective management of grasslands. This may require a model to convert the fire point records into continuous spatial distribution data. Kernel density estimation (KDE) can be used to represent the spatial distribution of grassland fire occurrences and decrease the influences historical records in point format with inaccurate positions. The bandwidth is the most important parameter because it dominates the amount of variation in the estimation of KDE. In this study, the spatial distribution characteristic of the points was considered to determine the bandwidth of KDE with the Ripley’s K function method. With high, medium, and low concentration scenes of grassland fire points, kernel density surfaces were produced by using the kernel function with four bandwidth parameter selection methods. For acquiring the best maps, the estimated density surfaces were compared by mean integrated squared error methods. The results show that Ripley’s K function method is the best bandwidth selection method for mapping and analyzing the risk of grassland fire occurrence with the dependent or inaccurate point variable, considering the spatial distribution characteristics.


Sign in / Sign up

Export Citation Format

Share Document