One of the biggest and most pervasive issues facing the insurance sector is the filing of false insurance claims by customers. Insurance firms incur significant financial losses due to pricey fraudulent claims. Concerns from stakeholders and observers have been raised about insurance fraud, which continues to be a major concern for insurers and customers who pay the expenses through insurance premiums. Understanding the institution processes and operationalization of ICT in fraud detection is the first step in implementing the appropriate corrective actions. However, the procedure is time and money consuming because personally reviewing all insurance claims filed with insurance companies has become challenging.
Given the prevalent issue of fraud in vehicle insurance claims, the manual approach to identifying fraudulent claims has been problematic because it is time-consuming and inaccurate. One of the various ways that researchers have tested is machine learning algorithms, which have demonstrated promising performance and enhanced accuracy in detecting fraudulent vehicle insurance claims. This study evaluated a range of ML algorithms, including AdaBoost, XGBoost NB, SVM, LR, DT, ANN, and RF, to discern between real and fraudulent automobile claims. Additionally, a machine learning- powered web-based system to predict and categorize vehicle insurance claims as either genuine or fraudulent was developed. The system was based on the machine learning classifier with the highest levels of prediction performance and classification accuracy.
The AdaBoost and XGBoost classifiers outperformed the other models with both imbalanced and balanced data because they had the highest classification accuracy of 84.5%. The LR classifier performed poorly since it had the lowest classification accuracy for both unbalanced and balanced data. The ANN classifier performed better with unbalanced data than it did with balanced data. The final finding was that all eight classifiers could only be used on smaller datasets.
TABLE OF CONTENTS
TABLE OF CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
ABBREVIATIONS AND ACRONYMS ix
CHAPTER 1: INTRODUCTION
1.1 Background 1
1.2 Problem Statement 2
1.3 Main Objective 3
1.4 Specific Objectives 3
1.5 Study Significance 4
CHAPTER 2: LITERATURE REVIEW
2.1 Vehicle Insurance in Kenya 5
2.2 Fraud Detection in Vehicle Insurance Sector 6
2.3 Manual Fraud Detection Approaches 6
2.4 Automation of Fraud Detection Systems 7
2.5 Insurance Fraud Detection using Machine Learning 9
2.6 Machine Learning Classifiers for Vehicle Insurance Fraud Detection 13
2.6.1 Naïve Bayes (NB) Classifier 13
2.6.2 Decision Tree (DT) Classifier 14
2.6.3 Logistic Regression (LR) Classifier 14
2.6.4 Random Forest (RF) Classifier 15
2.6.5 Support Vector Machine (SVM) Classifier 16
2.6.6 Adaptive Boosting (AdaBoost) Classifier 16
2.6.7 Extreme Gradient Boosting (XGBoost) Classifier 17
2.6.8 Artificial Neural Networks (ANN) Algorithm 17
2.7 Research Gap 18
2.8 Proposed System Description 18
CHAPTER 3: RESEARCH METHODOLOGY
3.1 Introduction 20
3.2 CRISP-DM Methodology 20
3.2.1 Business Understanding 21
3.2.2 Data Understanding 21
3.2.3 Data Preparation 25
184.108.40.206 Data Clean-up 25
220.127.116.11 Data Transformation 28
18.104.22.168 Data Integration 29
22.214.171.124 Feature Selection 31
3.2.4 Modelling 32
126.96.36.199 Experiment Environment 33
3.2.5 Evaluation 33
188.8.131.52 Confusion Matrix 33
184.108.40.206 Accuracy 33
220.127.116.11 Precision 34
18.104.22.168 Recall 34
22.214.171.124 F-1 Score 34
3.2.6 Deployment 34
CHAPTER 4: RESULTS AND DISCUSSIONS
4.1 Introduction 35
4.2 Data Exploratory Analysis 35
4.3 Machine Learning Classifier’s Evaluation 35
4.4 Performance Evaluation and Results 36
4.5 Fraudulent Vehicle Claims Detection System 38
4.6 Study Discussions 40
CHAPTER 5: CONCLUSION AND RECOMMENDATIONS
5.1 Introduction 42
5.2 Summary of Findings 42
5.3 Study Conclusion 42
5.4 Study Achievements 43
5.5 Study Limitations 44
5.6 Study Recommendations 44
5.7 Future Work Suggestions 45
Appendix 1: Project Budget 52
Appendix 2: Project Schedule 52
Appendix 3: Motor Vehicle Insurance Proposal Form 53
Appendix 4: Motor Vehicle Insurance Claim Form 61
Appendix 5: Models Training Source Code 64
Appendix 6: Web Application Source Code 65
LIST OF FIGURES
Figure 1: Random Forest Classifier Stages 15
Figure 2: Proposed Model Diagram with Unbalanced and Balanced Datasets 19
Figure 3: Proposed Machine Learning-Powered Web-Based System 19
Figure 4: CRISP-DM Methodology Diagram 21
Figure 5: Vehicle Insurance Claims CSV File Extract 22
Figure 6: Vehicle Insurance Claims Distribution 22
Figure 7: Dataset Columns Showing Input Variables 23
Figure 8: Dataset Columns Showing Null Values 24
Figure 9: Checking and Filling Null Values 25
Figure 10: Correlation Heatmap Among Data Variables 26
Figure 11: Unique Values Present in the Data Variables 27
Figure 12: Categorical Data Columns Unique Values 28
Figure 13: Converted Categorical Data Columns into Integer Values 29
Figure 14: Final Dataset Data Distribution Plot 30
Figure 15: Inter Quantile Range Calculation Graph 31
Figure 16: Features Maintained for Classification 32
Figure 17: Web-Based Application Screen Image. 38
Figure 18: Categorized Vehicle Insurance Claims 39
Figure 19: Generated CSV File of Categorized Vehicle Insurance Claims 40
Figure 20: Project Schedule 52
LIST OF TABLES
Table 1: Dataset Features 24
Table 2: Unbalanced Dataset Evaluation Report 36
Table 3: Balanced Dataset Evaluation Report 37
Table 4: Project Budget 52
ABBREVIATIONS AND ACRONYMS
ACL Audit Command Language.
AdaBoost Adaptive Boosting.
ADASYN Adaptive Oversampling Technique.
ANN Artificial Neural Network.
CHAID Chi Square Automatic Interaction Detection.
CRISP-DM CRoss Industry Standard Process for Data Mining.
CSV Comma-Separated Values.
DT Decision Tree.
GBM Gradient Boosting Machines.
GLM Generalized Linear Models.
ICT Information Communication Technology.
LDA Latent Dirichlet Allocation.
LMT Logistic Model Tree.
LR Logistic Regression.
MCC Matthews's Correlation Coefficient.
ML Machine Learning.
MLP Multi-Layer Perceptron.
NB Naïve Bayes.
NBU Naïve Bayes Updatable.
RF Random Forest.
RT Random Tree.
SMOTE Synthetic Minority Oversampling Technique.
SVM Support Vector Machine.
XGBoost Extreme Gradient Boosting.
This section defines the problem statement, establishes the study's main purpose and specific objectives, and presents the research questions. The chapter's conclusion emphasizes the importance of the study.
Information and communication technology (ICT) has continuously asserted itself as the architect of systems in recent decades by connecting markets, enterprises, governments, and individuals. This connection has reduced distances and given the globe a multidimensional aspect, improving task administration and servicing, and enabling real-time combat against ethics and fraud. The ICT revolution of the twenty-first century not only influences commercial developments and organics, but it also forecasts and defines social interaction, culture, and behaviour at many levels. Susceptibility to fraud prevention is one of the organizational, individual, and behavioural aspects that ICT has significantly altered in the business environment. Organizations like insurance companies have made significant investments in ICT to improve their capacity for information processing as part of this conflict and given them the advantage in identifying, handling, and reporting fraud-related situations.
False insurance claims filed by clients are one of the most frequent and chronic concerns confronting the insurance industry. Gill et al. (2005) defines insurance fraud as "knowingly making a fraudulent claim, inflating a claim, adding extra items to a claim, or being in any other way dishonest with the intent of collecting more than genuine entitlement." This rationale is based on dishonest, wilful, or fraudulent concealment, which leads in fraudulent claimant or policyholder illegal financial advantage. Insurance companies suffer huge financial losses because of costly fraudulent claims. As a result, it is critical to distinguish between genuine and fraudulent claims.
Insurance fraud continues to be a big concern for insurers and clientele who pay the expenses incurred through insurance premiums, according to a report by the Coalition Against Insurance Fraud (2016), raising concerns among stakeholders and observers. Understanding the procedures involved in implementing ICT and operationalizing it for fraud detection is the first step in adopting the appropriate corrective steps. Organizations like insurance companies have made significant investments in ICT to improve their capacity for information processing as part of this fight. With the use of this information, they now have the advantage in identifying, handling, and reporting fraud- related situations. Nevertheless, reviewing every insurance claim submitted to the insurance companies, has become quite challenging, rendering the process expensive in terms of time and money invested.
Fraud in the insurance sector can be seen from four angles, according to International Association of Insurance Supervisors (2011) and Frimpong (2016). The first of them is internal fraud, which refers to an insurance employee defrauding the insurance firm either on their own or in collaboration with other parties either inside or internationally. The second type of fraud is false claim, in which the insured party provides false information to obtain payment or wrongful coverage. Thirdly, insurance intermediary fraud, in which insurance intermediaries conspire with one another or act alone to defraud the policyholder or insurer, and fourthly, insurer fraud, in which the insurer defrauds the policyholder through unfair policies, payment premiums, and compensation schemes.
Considering the widespread challenge of fraud in vehicle insurance claims, the manual approach for identifying fraudulent claims is problematic because it is slow and imprecise. Hence, Machine Learning (ML) approaches can be used to detect fraudulent vehicle insurance claims effectively due to their superior performance and improved predictive accuracy.
1.2 Problem Statement
Fraud has long been a significant concern and one of the most serious problems facing organizations due to the catastrophic effects. According to Gedela and Karthikeyan (2022), fraud is any act aimed at defrauding another party financially. Sybase (2012) emphasizes that steps should be taken to allow fraud detection as a first line of defence since it recognizes the financial cost and cultural consequences of the problem. The Kenyan insurance sector is well established, according to Association of Kenya Insurers (2020) and ranks first in Sub-Saharan Africa with a high growth rate, (African Insurance Organization, 2018). This has made a significant contribution to the market's readiness for adoption and attraction of foreign investment. However, holding such a prestigious position comes with a lot of challenges, chief among them being fraud and competition.
According to the Insurance Regulatory Authority (2021), the insurance industry is notoriously hesitant to evolve, especially when it comes to using new technologies to combat the alarming issue of fraud. They present numerous explanations for this, such as a lack of funding, the belief that things should be done the way they have always been done, and overstretched resources. Despite this, the insurance sector must act quickly to stay ahead of the growing fraud rates to safeguard both itself and policyholders. The Authority also points out that due to an increase in complaints and rising fraud, costs associated with fraud investigations and tribunals are anticipated to reach tens of millions of dollars yearly.
Motor vehicle insurance fraud is a serious vice that has contributed to the collapse of several insurance companies and continues to present a substantial challenge to the insurance industry. According to the Association of Kenya Insurers (2020), automobile insurance is one of the most difficult products for Kenyan insurance companies to sell since they suffer significant technical losses, which amount to 68.92% for private vehicles and 60.72% for commercial vehicles. This means in other words, for every KShs 100 in premiums received by the insurer, KShs 68.92 and KShs 60.72 are used to settle insurance claims, respectively. The issue is exacerbated by the significant costs associated with the investigations done to confirm the claim's validity, which account for 44.16 percent of overall costs. This implies that the insurer loses KShs 13.08 and KShs 4.88 in net premium revenue, respectively. Most of these losses are attributable to fraudulent insurance claims.
Additionally, according to statistics from the Insurance Regulatory Authority (2021), 35 percent of insurance claims were fraudulent, with motor vehicle insurance claims leading the way and registering the greatest loss percentages in the sector. The fraudulent automobile insurance claims entail someone engaging in a variety of unethical behaviours to obtain a favourable conclusion from the insurance providers. These acts range from fabricating accidents, making false insurance claims, fabricating details for a real insurance claim, and misrepresenting an incident's cause and relevant players (Subudhi, et.al, 2018). As a result of the rise in fraudulent vehicle insurance claims, insurance companies are devoting more time and resources to the detection of these claims. The employment of conventional methods allows some to go unnoticed. As the economy recovers, an increase in fraud claims will raise overall insurance costs, making the issue of fraudulent insurance claims a key concern for both the government and insurance companies.
1.3 Main Objective
The primary objective of this project was to investigate how machine learning algorithms can leverage features extracted from vehicle insurance claim datasets to aid in the detection of fraudulent vehicle insurance claims. Following this investigation, a novel system to predict and categorize vehicle insurance claims as either genuine or fraudulent was developed.
1.4 Specific Objectives
1. Characterise fraudulent insurance claims in the context of vehicle insurance domain.
2. Identify features that could be utilized to train machine learning models to recognize fraudulent vehicle insurance claims.
3. Evaluate the performance of several machine learning models for detecting fraudulent vehicle insurance claims using a balanced and imbalanced dataset.
4. Develop a system that categorises vehicle insurance claims as either genuine or fraudulent using the best performing machine learning classifier.
1.5 Study Significance
This study is timely in that it offers a mechanism for developing a system by using the top-performing machine learning algorithm to identify fraudulent vehicle insurance claims. As the number of fraudulent insurance claims rises and their detection becomes a difficult problem on a global scale, fraud in the insurance industry is becoming an increasing concern. By guaranteeing quality and stability, this will assist insurance businesses in showcasing their exceptional claim administration, which will have a significant impact on their revenue and client’s satisfaction. Additionally, the study will broaden the area of machine learning investigation into the identification of fraudulent vehicle insurance claims in the Kenyan insurance sector.
Click “DOWNLOAD NOW” below to get the complete Projects
FOR QUICK HELP CHAT WITH US NOW!
+(234) 0814 780 1594