Two-Stage Automobile Insurance Fraud Detection by Using Optimized Fuzzy C-Means Clustering and Supervised Learning
A novel two-stage automobile insurance fraud detection system is proposed that initially extracts a test set from the original imbalanced insurance dataset. A genetic algorithm based optimized fuzzy c-means clustering is then applied on the remaining data set for undersampling the majority samples by eliminating the outliers among them. Thereafter, the detection of the fraudulent claims occurs in two stages. In the first stage, each insurance record is passed to the clustering module that identifies the claim as genuine, malicious, or suspicious. The genuine and malicious samples are removed and only the suspicious instances are further scrutinized in the second stage by four trained supervised classifiers − Decision Tree, Support Vector Machine, Group Method for Data Handling and Multi-Layer Perceptron individually for final decision making. Extensive experiments and comparative analysis with another recent approach using a real-world automobile insurance dataset justifies the effectiveness of the proposed system.