A FEDERATED LEARNING MODEL FOR THE DETECTION OF INSURANCE CLAIMS FRAUD

0 Review(s)

Product Category: Projects

Product Code: 00006504

No of Pages: 55

No of Chapters: 1-5

File Format: Microsoft Word

Price :

$20

Add to Cart
Buy Now

Report This Item

ABSTRACT

Practical insurance fraud detection solutions require sufficient quality data from insurers to build effective models. However, insurance data is generally proprietary information for specific insurance companies and thus not publicly available. Also, the Insurance datasets are often imbalanced, making it challenging to develop fraud detection models that are not biased. Data privacy and class imbalance are two significant challenges when developing artificial intelligence applications in the insurance setup. In this research study, we tackle these challenges and propose a decentralized and privacy-preserving federated approach using an adjusted random forest model. The method is asynchronous federated learning of the traditional adjusted random forest classifier, i.e., achieving a higher performance and accuracy level than the traditional centralized learning approach. Based on it, we achieved secure collaborative machine learning that allows the training of quality federated fraud detection models from imbalanced data without sharing data. Experiments on Kaggle and Oracle insurance datasets demonstrate that the federated adjusted random forest classifier is more accurate and efficient than the non-federated counterpart. Our model is verified to be practical, efficient and scalable for real-life insurance fraud detection tasks.

Keywords: Fraud Detection, Federated Learning, Adjusted Random Forests, Feature Selection, Ensemble methods.

TABLE OF CONTENTS

DECLARATION I

ACKNOWLEDGEMENT II

DEDICATION III

ABSTRACT IV

CONTENTS V

LIST OF FIGURES VII

LIST OF TABLES VIII

LIST OF ABBREVIATIONS VIII

CHAPTER 1: INTRODUCTION

1.1 BACKGROUND 1

1.2 PROBLEM STATEMENT 2

1.3 RESEARCH OBJECTIVES 3

1.3.1 General Objective 3

1.3.1 Research Questions 3

1.4 JUSTIFICATIONS 4

1.5 CONTRIBUTIONS OF THE RESEARCH 5

1.6 SCOPE OF THE STUDY 5

CHAPTER 2: LITERATURE REVIEW

2.1 INTRODUCTION 6

2.2 INSURANCE FRAUD DETECTION METHODS 6

2.3 PRIVACY-PRESERVING MACHINE LEARNING METHODS 8

2.4 FEATURE ENGINEERING METHODS IN INSURANCE FRAUD DETECTION 10

2.5 RESEARCH GAP 12

2.6 CONCEPTUAL FRAMEWORK 13

CHAPTER 3: METHODOLOGY

3.1 INTRODUCTION 15

3.2 STUDY POPULATION 16

3.3 DATA COLLECTION 16

3.4 DATA ANALYSIS 17

3.4.1 Data Cleaning 21

3.4.2 Data Transformation 22

3.5 FEATURE ENGINEERING 22

3.5.1 Feature Engineering 22

3.5.2 Feature Selection 23

3.6 FEDERATED MODEL DESIGNS 24

3.6.1 Feature Alignment 24

3.6.2 Federated Adjusted Random Forests 24

3.6.3 Horizontal Federated Learning 26

3.6.4 Decentralized Architecture Design 27

3.6.5 Decentralized Algorithm Design 28

3.7 IMPLEMENTATION AND PROTOTYPING 29

3.8 MODEL EVALUATION 30

3.9 ETHICAL CONSIDERATIONS 30

CHAPTER 4: RESULTS AND DISCUSSIONS

4.1 INTRODUCTION 31

4.2 EVALUATION RESULTS 31

4.2.1 Classification Accuracy 31

4.2.2 Confusion Matrix 32

4.2.3 Classification Report 35

4.3 DISCUSSION 37

4.4 MODEL VERDICT 38

CHAPTER 5: CONCLUSION AND RECOMMENDATIONS

5.1 CONCLUSION 39

5.2 RECOMMENDATION 40

5.3 FUTURE RESEARCH 40

REFERENCES 41

APPENDICES 43

IRA STATISTICS 43

EMAIL CORRESPONDENCE TO REQUEST FOR INSURANCE DATA 44

LIST OF FIGURES

Figure 1 Conceptual Framework 14

Figure 2 Research Process 15

Figure 3 Features of various datasets 17

Figure 4 Features Correlation Heat-Map 18

Figure 5 Features correlating with fraud state 19

Figure 6 Data Analysis 20

Figure 7 Filling Missing Values 21

Figure 8 Data Transformation 22

Figure 9 Kaggle Dataset Selected Features 23

Figure 10 Oracle Dataset Selected Features 23

Figure 11 Feature Alignment 24

Figure 12 Bagging Using Adjusted Random Forest 25

Figure 13 Random Federated Forests 26

Figure 14 Decentralized Design 27

Figure 15 Federated Algorithm 28

Figure 16 Federated Random Forest Before Balancing-Kaggle Dataset 33

Figure 17 Federated Random Forest Before Balancing-Oracle Dataset 33

Figure 18 Balanced Federated Random Forest-Kaggle Dataset 34

Figure 19 Balanced Federated Random Forest- Oracle Dataset 34

Figure 20 Email to Head of Innovations IRA 44

Figure 21 Email to head of Research and Innovation IRA 45

Figure 22 Email to the C.E.O and Commissioner for Insurance IRA 46

LIST OF TABLES

Table 1 Accuracy Score for Kaggle Dataset 32

Table 2 Accuracy for Oracle Dataset 32

Table 3 Classification Report Before Feature Selection-Kaggle Dataset 35

Table 4 Classification Report Before Feature Selection-Oracle Dataset 36

Table 5 Classification Report After Feature Selection-Kaggle Dataset 36

Table 6 Classification Report After Feature Selection-Oracle Dataset 36

Table 7 Models Average Performance 38

Table 10 IRA Statistics 43

LIST OF ABBREVIATIONS

ML Machine Learning

ANN Artificial Neural Network

FL Federated Learning

B2B Business to business

IRA Insurance Regulatory Authority

RFM Recency, frequency, and monetary HOBA Homogeneity-oriented behavior analysis CART Classification and Regression Trees

CHAPTER ONE

INTRODUCTION

1.1 Background

Insurance claims fraud (illegitimate claims), other than tax fraud, is recorded to be the most practised fraud globally. The significant accumulation of liquid financial assets makes insurance companies susceptible to loot schemes and takeovers (Association of Certified Fraud Examiners, 2019). Insurance claims fraud occurs when the insured attempts to gain profits through premiums paid without complying with the insurance agreement terms (Association of Certified Fraud Examiners, 2019). Detecting fraud manually has always been costly for insurance companies. Low incidents that go undetected contribute immensely to the claim ratio. For example, the Industry average Incurred claims ratio (loss ratio) is 64.34%, with motor insurance accounting for 24.6% of the total industry paid claims under the general insurance business (Insurance Regulatory Authority, 2020).

The research community has focused on insurance fraud detection methods that require centralized datasets from specific insurers. There is vast body of literature published on fraud detection methods in the Insurance Industry. These methods, however, use insurance data from specific insurers that might not be representative of the industry fraud problem. Feeble attempts have been made to look at fraud detection methods from an industry perspective. The quality of the data needed to train predictive models is as important as the quantity required. Datasets must be representative and balanced to provide a better picture and avoid bias (Rama Devi Burri, et al., 2019). Recent studies on claim analysis using machine learning recorded data security challenges in implementing machine learning. Vast amounts of data required for machine learning have created additional risk for insurance companies. The increase in data collection and connectivity among applications can lead to data leaks and security breaches. This makes Insurers struggle to provide relevant data for training machine learning models (Rama Devi Burri, et al., 2019).

Significant studies have been conducted to explore the detection and prevention of Insurance fraud. For example, (Phua et al., 2010) have explored holistic and scientific approaches to fraud management. Their respective works observe that studies involving quantitative methods report limitations due to the lack of insurance data (Phua et al., 2010). However, because of the evolving nature of insurance fraud, there still exist challenges due to the lack of sufficient insurance data and a class imbalance problem in claim datasets that have attracted the attention of researchers. Insurance fraud detection problems are often biased because they reduce the overall error rate instead of taking care of minority classes (Johannes & Rajasvaran, 2020). Studies have shown that the lack of primary insurance data and imbalanced datasets is a challenge when developing machine learning models in insurance. Imbalanced datasets often produce biased models that cannot make correct predictions (Johannes & Rajasvaran, 2020). Insurance companies that adopt a centralized approach for insurance claim fraud detection face a class imbalance problem, a case where fraud incidents are less than the total number of claims (R Guha et al., 2017).

1.2 Problem Statement

The research community has focused on insurance fraud detection methods that require centralized datasets to train models. Centralized machine learning methods often produce biased models which are not effective in detecting insurance fraud. The bias in insurance models is primarily attributed to two issues; the class imbalance problem in datasets, where fraud incidents are less than the total number of genuine claims, insufficient insurance data to train the models and. For example, (Johannes & Rajasvaran, 2020) presents a behavioural feature engineering approach for motor insurance fraud detection. However, in the study, they observe that insurance claim data is often imbalanced where at least one class forms a tiny minority of the data. The works by (Burri et al., 2019) provides an in-depth claim analysis using machine learning. The study, however, reports challenges in finding suitable data sources and data security in implementing machine learning (Burri et al., 2019). There have been feeble attempts to look at fraud detection methods that benefit all insurance players instead of individual insurers.

Previous studies present a need for a quality fraud detection system that can tackle the class imbalance problem in the datasets. In this case, all participants who do not have sufficient datasets can collaborate in building a quality model. Studies also present a need to implement privacy- preserving methods that can be used to train machine learning models without sharing data. The methods used in the previous studies suffer drawbacks due to the quality of data used to train fraud detection models; the studies have also presented challenges in accessing quality datasets from insurers (Burri et al., 2019).

1.3 Research Objectives

1.3.1 General Objective

To implement a privacy-preserving federated machine learning framework for the Insurance setup that will be used to train fraud detection machine learning models while preserving the privacy of data. The model will be used to detect insurance claims fraud. We will evaluate the effectiveness of this framework in improving the prediction accuracy of Insurance fraud detection models. The accuracy in the prediction of the model will be assessed against past insurance claims data.

1.3.1 Research Questions

1. What practical technique can be used to build quality insurance fraud detection models while preserving policyholders’ privacy?

2. How can quality fraud detection models be built from imbalanced insurance datasets?

3. What is the prediction performance of federated insurance fraud detection models?

4. Which optimal feature engineering and selection method is used for high dimensional datasets?

1.4 Justifications

This research aims to implement a privacy-preserving machine learning architecture that will be used to train insurance fraud detection models. While machine learning methods such as classification and regression algorithms have been identified and studied in previous research (Burri et al., 2019), the studies do not show that such algorithms can train machine learning models while preserving the privacy of insurance data. In addition, little research has been done to show that decentralized methods can be used with imbalanced datasets to produce quality insurance fraud detection models. The broad topic of insurance fraud detection has received attention, including from insurers and government regulators. Still, decentralized, collaborative and privacy- preserving machine learning methods have not been the focus of that attention. Instead, while acknowledging the challenges in finding quality insurance data and the class imbalance problem in datasets, the research community currently focuses on centralized machine learning methods biased towards individual insurance companies (Dhieb et al., 2019).

The insurance industry, with hundreds of years of history, is characterized by fierce competition. Data has become a valuable resource that Insurers have to protect, hindering the development of solutions that benefit all industry players. Insurers struggle to release relevant data for training AI models (Burri et al., 2019). Brilliant ideas give value to the industry, such as automated underwriting, automated claims fraud detection require privacy-preserving machine learning methods. There is a need to introduce collaborative privacy-preserving approaches to machine learning and data science in insurance. This study will provide insights into insurance claims management practice by exploring privacy-safe methods that can be used to detect insurance claims fraud accurately.

1.5 Contributions of the Research

Research has reported problems in implementing machine learning in the insurance industry, including lack of suitable data sources, data security, and imbalanced datasets (Burri et al., 2019). Balanced datasets give a better picture and avoid bias in prediction. Imbalanced insurance data makes it challenging to produce quality models shared across the industry. This research will draw recommendations on model performance built using imbalanced datasets, improving the prediction accuracy of fraud detection models. The research seeks to demonstrate that using privacy-preserving methods to train a model on decentralized data preserves data integrity, improves prediction accuracy, and, therefore, a practical approach in claims fraud prediction. The study will contribute to the Insurance Claims Management practice and claims fraud detection. In addition, this study will contribute to the knowledge of privacy-preserving machine learning in insurance.

1.6 Scope of the Study

The study will be limited to the General Category of Insurance. This area of the Insurance business is selected because it accounts for the insurance industry’s highest-paid claims. The insurance regulatory authority regulates thirty-six Insurance companies offering a General Category of Insurance. The companies could be used in this research. However, to facilitate this project, we select three major Insurance Companies. The companies understudy would need to be actively engaged in the motor class of insurance.

Click “DOWNLOAD NOW” below to get the complete Projects

FOR QUICK HELP CHAT WITH US NOW!

+(234) 0814 780 1594

Click to view other Projects >> Postgraduate Projects >> Computer Science Projects

Buyers has the right to create dispute within seven (7) days of purchase for 100% refund request when you experience issue with the file received.

Dispute can only be created when you receive a corrupt file, a wrong file or irregularities in the table of contents and content of the file you received.

ProjectShelve.com shall either provide the appropriate file within 48hrs or send refund excluding your bank transaction charges. Term and Conditions are applied.

Buyers are expected to confirm that the material you are paying for is available on our website ProjectShelve.com and you have selected the right material, you have also gone through the preliminary pages and it interests you before payment. DO NOT MAKE BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.

In case of payment for a material not available on ProjectShelve.com, the management of ProjectShelve.com has the right to keep your money until you send a topic that is available on our website within 48 hours.

You cannot change topic after receiving material of the topic you ordered and paid for.

Ratings & Reviews

0.0

No Review Found.

Review

Login To Comment

Download Now

Filter Results By

Sold By

ProjectShelve

8215

Total Item

Contact Seller

Seller's Products

$12

AN EVALUATION OF MANAGEMENT ACCOUNTING TECHNIQUES ON ORGANIZATION...

$20

DESIGN AND CONSTRUCTION OF ROBOT CONTROL USING RF MODULE

$20

DESIGN AND DEVELOPMENT OF AN E-BILLING SYSTEM

$20

DESIGN AND IMPLEMENTATION OF A COMPUTERIZED E-LEARNING AND E- EDU...

$12

ANALYSIS OF AUDIT PROCEDURE IN A PUBLIC SECTOR ORGANIZATION (A C...

$12

ASSESSMENT OF THE IMPACT OF BANK CREDIT ON AGRICULTURAL DEVELOPME...

$12

CAPITAL MARKET IN NIGERIA, ITS EVOLUTION, FUNCTION AND IMPACT ON ...

$12

EFFECT OF INTERNAL AUDIT ON MANAGERIAL PERFORMANCE IN PUBLIC ENTE...

Reviews (31)

Anonymous

3 months ago
This is so amazing and unbelievable, it’s really good and it’s exactly of what I am looking for
Anonymous

4 months ago
Great service
Anonymous

4 months ago
This is truly legit, thanks so much for not disappointing
Anonymous

4 months ago
I was so happy to helping me through my project topic thank you so much
Anonymous

4 months ago
Just got my material... thanks
Anonymous

4 months ago
Thank you for your reliability and swift service Order and delivery was within the blink of an eye.
Anonymous

4 months ago
It's actually good and it doesn't delay in sending. Thanks
Anonymous

5 months ago
I got the material without delay. The content too is okay
Anonymous

5 months ago
Thank you guys for the document, this will really go a long way for me. Kudos to project shelve👍
Anonymous

5 months ago
You guys have a great works here I m really glad to be one of your beneficiary hope for the best from you guys am pleased with the works and content writings it really good
Anonymous

5 months ago
Excellent user experience and project was delivered very quickly
Anonymous

5 months ago
The material is very good and worth the price being sold I really liked it 👍
Anonymous

5 months ago
Wow response was fast .. 👍 Thankyou
Anonymous

5 months ago
Trusted, faster and easy research platform.
TJ

5 months ago
great
Anonymous

5 months ago
My experience with projectselves. Com was a great one, i appreciate your prompt response and feedback. More grace
Anonymous

5 months ago
Sure plug ♥️♥️
Anonymous

5 months ago
Thanks I have received the documents Exactly what I ordered Fast and reliable
Anonymous

5 months ago
Wow this is amazing website with fast response and best projects topic I haven't seen before
Anonymous

6 months ago
Genuine site. I got all materials for my project swiftly immediately after my payment.
Anonymous

6 months ago
It agree, a useful piece
Anonymous

6 months ago
Good work and satisfactory
Anonymous

6 months ago
Good job
Anonymous

6 months ago
Fast response and reliable
Anonymous

6 months ago
Projects would've alot easier if everyone have an idea of excellence work going on here.
Anonymous

6 months ago
Very good 👍👍
Anonymous

6 months ago
Honestly, the material is top notch and precise. I love the work and I'll recommend project shelve anyday anytime
Anonymous

6 months ago
Well and quickly delivered
Anonymous

6 months ago
I am thoroughly impressed with Projectshelve.com! The project material was of outstanding quality, well-researched, and highly detailed. What amazed me most was their instant delivery to both my email and WhatsApp, ensuring I got what I needed immediately. Highly reliable and professional—I'll definitely recommend them to anyone seeking quality project materials!
Anonymous

6 months ago
Its amazing transacting with Projectshelve. They are sincere, got material delivered within few minutes in my email and whatsApp.
TJ

8 months ago
ProjectShelve is highly reliable. Got the project delivered instantly after payment. Quality of the work.also excellent. Thank you

Categories

A FEDERATED LEARNING MODEL FOR THE DETECTION OF INSURANCE CLAIMS FRAUD

Ratings & Reviews

Review

Login To Comment

Filter Results By

Sold By

Seller's Products

Reviews (31)

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

TJ

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

Anonymous

TJ

Related Products

DETECTION OF FRAUDULENT VEHICLE INSURANCE CLAIMS USING MACHINE LEARNING

COMPARATIVE ANALYSIS OF AMOUNT OF PREMUIM PAID AND CLAIMS SETTLED ON FIRE, LIFE AND MOTOR POLICIES: A CASE OF KWARA INSURANCE BROKER LIMITED, KWARA STATE

APPLICATION OF MACHINE LEARNING TO DETECT FRAUDULENT MATERNAL MEDICAL CLAIMS

MATHEMATICAL MODELING OF MALARIA: A SUSCEPTIBLE, INFECTED, RECOVERED MODEL (SIR MODEL)

STATISTICAL ANALYSIS OF INSURANCE OPERATION IN KWARA STATE (A CASE STUDY OF GATEWAY INSURANCE COMPANY PLC ILORIN FROM)

EVALUATION OF INTERNAL CONTROL SYSTEM IN INSURANCE INDUSTRY (A STUDY OF AIICO INSURANCE PLC,)

MANAGEMENT OF INSURANCE COMPANIES IN NIGERIA A CASE STUDY OF UNIC INSURANCE PLC

IMPACT OF INSURANCE BUSINESS ON THE ECONOMIC DEVELOPMENT OF NIGERIA (A CASE STUDY OF NICON INSURANCE CORPORATION, LAGOS)