ABSTRACT
Fraud is among the most menacing problems with which every human society grapples, given the devastating impact on the effects. This practice refers to the deliberate use of false information to swindle another individual or organization money or property (Association of Certified Fraud Examiners, 2021). The banking industry has for decades used rule-based systems to flag fraud and human review of transactions. Rule-based systems encompass utilizing algorithms which perform a variety of detection actions that are written manually by fraud experts (Oniyilo, 2016). These systems require the manual adjustment of scenarios, which make it challenging to implicitly detect the transactional correlations that would point to fraud. Due to the inherent weaknesses of the rule-based fraud detection approach at banks and limited data that affects commonly used supervised machine learning algorithms, there is an urgent need for new detection techniques or systems that can handle the rapidly increasing fraud and money laundering incidences that adversely affect the Kenyan banking system.
This research aimed to analyse and evaluate various machine learning algorithms to determine their performance in fraud detection for mobile banking transactions within the banking system. The study's objectives were to identify the data attributes that are best suited for mobile banking fraud detection machine learning algorithms and compare the performance of machine learning algorithms in fraud detection in mobile banking transactions.
This study used the CRISP_DM methodology to determine the most accurate fraud detection algorithm. It was published to standardise the data mining processes over the industries. It has since evolved to be the most used methodology in mining data, performing analytics, and projects in data science. Crisp- DM follows the following general steps Business understanding, Data understanding, Data preparation, Modelling, Evaluation and Deployment.
The research results demonstrated that logistic regression did not perform and indicated by the scores and did not predict any fraudulent transactions for the original unbalanced data. As such, it is therefore not recommended for fraud detection. Naïve Bayes performance based on the confusion matrix performed poorly as the algorithm predicted 89,997 false positives, 0 False negatives and 55 True negatives. While predicting the fraudulent transactions accurately, many non-fraudulent transactions were predicted to be fraudulent. These results go hand in hand with results from the scores, which demonstrated that Naïve Bayes had 0.08% accuracy. K Nearest Neighbour had the best results; the algorithm's accuracy was 99%, with 19 true negative predictions, two false positives and 8 False negatives. Therefore, KNN was identified as the preferred algorithm for fraud detection. Additionally, it is noted that when the transactional data is removed KNN, performed marginally better than when the static and scoring data from the fraud detection system are removed.
TABLE OF CONTENTS
CHAPTER ONE
1. Introduction 9
1.1 Background 9
1.2 Problem Statement 10
1.3 Main Objective 10
1.4 Specific Objective 11
1.5 Research Questions 11
1.6 Significance 11
1.7 Justification 11
1.8 Scope of Study 12
CHAPTER TWO
2. LITERATURE REVIEW 13
2.1 Fraud Detection 13
2.2 Fraud Detection in the Banking Industry 13
2.3 Related Work 15
2.4 Algorithms Used in Fraud Detection Systems 16
2.5 Gaps Identified in Literature Review 17
2.6 Description of Proposed Solution 18
CHAPTER THREE
3. Research Design and Methodology 19
3.1 Introduction 19
3.2 Research design 19
3.3 Business Understanding 20
3.4 Data Understanding 20
3.5 Data preparation 22
3.6 Modelling 24
3.7 Experimentation Environment 25
3.8 Evaluation 26
CHAPTER FOUR
4. RESULTS AND DISCUSSION 28
4.1 Introduction 28
4.2 Exploratory Data Analysis 28
4.3 Evaluation 28
4.4 Working with Unbalanced Data with Logistic Regression 31
TABLE OF FIGURES
Figure 1:Proposed Machine Learning Model 18
Figure 2:CRISP_DM Methodology 20
Figure 3:Data Cleansing in Alteryx 23
Figure 4:Masked Data to protect Personal Identifiable Information 23
Figure 5:Correlation matrix of Fraud transaction data 24
Figure 6:Data Attribute Performance Comparison 33
LIST OF TABLES
Table 1:Confusion Matrix For Logistic Regression, Naive Bayes and KNN 29
Table 2: Score Results for Naive Bayes, Logistic Regression and KNN 30
Table 3:KNN Confusion Matrix when N is 5 30
Table 4:KNN Confusion Matrix when N is 2 30
Table 5: Oversampled data Confusion Matrix for Logistic Regression 31
Table 6:Undersampled data Confusion Matrix for Logistic Regression 31
Table 7:Confusion Matrix when scored data is dropped 32
Table 8:Confusion Matrix when static data is dropped 32
Table 9:Confusion Matrix when Transactional data is dropped 33
LIST OF ABBREVIATIONS
ML Machine Learning
KNN K Nearest Neighbour
BACC Balanced Accuracy
CRISP_DM CRoss Industry Standard Process for Data Mining
TP True Positive
FP False Positive
FN False Negative
TN True Negative
CHAPTER ONE
INTRODUCTION
1.1 Background
Fraud is among the most menacing problems with which every human society grapples, given the devastating impact on the effects. This practice refers to the deliberate use of false information to swindle another individual or organization money or property (Association of Certified Fraud Examiners, 2021). The COVID-19 pandemic has seen an increase in fraudulent activities due to the economic downturn as the number of unemployed people during the period increased (Colvin, 2020). Thousands of people were rendered jobless, salaries were reduced, and unemployment rates soared. Naturally, more people have fewer resources at their disposal for their survival, which explains the rise in fraudulent activities as people attempt all means to survive the austere economic times. Fraud experts argue that fraud stems from three elements that include pressure, an opportunity, and rationalisation (Littman, 2011). According to this author, fraud happens when an individual develops an unshareable pressure and motive to commit the fraud.
In most cases, the fraud perpetrator has an unmet need but with limited resources. Unmet needs are endless and vary for different people. It could be a mounting medical bill, reduced income in the household, or gambling debts. Once the person has unmet needs and has limited resources, they identify the opportunity to commit fraud. Perceive opportunities may be reckless management or a lack of internal controls within an entity that would make fraud an easy activity. Lastly, the individual rationalises their decision to commit fraud by convincing themselves that they needed the money more or paying it back eventually. With the tepid economic times, there has been increased pressure and motive to commit fraud, which would make it easy for fraudsters to rationalise their actions.
Fraud is commonplace in the banking industry and includes email phishing, credit card fraud, money laundering, loan application fraud, financial statements fraud, and cyber fraud. With the advent of digital banking, digital fraud has also become more common within this sector. Therefore, it is important to acknowledge that fraud management has become necessary in the banking and commerce industry, which, admittedly, is an excruciating process. Fraudsters have become skillful at discovering loopholes and have established effective techniques such as phishing for unsuspecting individuals and creatively swindling money off them (How Machine Learning Facilitates Fraud Detection? 2021). Therefore, fraud detection methods have to continuously evolve as fraudsters become more effective in designing techniques that bypass rigid banking security systems and learn how to convince unsuspecting individuals to release their money to them.
Traditional Fraud detection methods within the banking industry have been rule-based, where human beings define the rules. 90% of the financial and banking institutions rely on these methods (Onifade and Afolabi, 2015). While more persons adopt new technologies, more fraud scenarios may happen, making those rule-based methods unscalable and unsustainable in the future. Moreover, false positives (i.e., non-fraudulent transactions catalogued as fraudulent) cause losses in millions of dollars in transactions and customer complaints in the banking industry. Rule-based methods contribute greatly to these outcomes. Ciobanu (2020) conducted a study with 1,000 adult consumers where he found that about 25% of them whose transactions were declined falsely—opted to engage in business with competitors. That rate of switching to competitors increased to 36% for consumers aged between 18 and 24 years old. It also increased to 31% for those aged between 25 and 34 years old. These study results indicate the dire need for more rigorous and modern fraud detection methods.
To add to the challenge with the traditional rule-based system, fraudsters lack specific patterns and constantly change their behaviour over time, making the systems cumbersome and rapidly obsolete. There is a clear need for a change of approach in security systems within the banking systems. According to the Nilson report (2019), it was anticipated that fraud involving cards only amounted to a staggering amount of $30 billion globally, by 2020. Additionally, with the technology disruption within the banking sector because of the existence of numerous payment channels such as credit and debit cards, smartphones, the rate of transactions has exponentially increased over the past few years. Fraudsters have also developed extremely effective fraudulent tactics. Given this situation, there is a need to develop more rigid and robust fraud detection approaches in banks. The most viable option is machine learning algorithms installed in banking systems.
1.2 Problem Statement
The banking industry has for decades used rule-based systems to flag fraud and human review of transactions. Rule-based systems encompass utilizing algorithms which perform a variety of detection actions that are written manually by fraud experts (Oniyilo, 2016). These systems require the manual adjustment of scenarios, which make it challenging to implicitly detect the transactional correlations that would point to fraud. Due to the inherent weaknesses of the rule-based fraud detection approach at banks and limited data that affects commonly used supervised machine learning algorithms, there is an urgent need for new detection techniques or systems that can handle the rapidly increasing fraud and money laundering incidences that adversely affect the Kenyan banking system.
1.3 Main Objective
This research aims to analyse and evaluate various machine learning algorithms to determine their performance in fraud detection for mobile banking transactions within the banking system. There is a need for real-time fraud detection methods to help banks protect themselves and their customers as transactions happen in real-time.
1.4 Specific Objective
1. To identify the data attributes that are best suited for mobile banking fraud detection machine learning algorithms.
2. To compare the performance of machine learning algorithms in fraud detection in mobile banking transactions.
1.5 Research Questions
1. What are fraud detection scenarios currently being used in traditional fraud detections systems?
2. Which machine algorithms are suitable for fraud detection for mobile banking transactions?
3. What features in machine algorithms are suitable for fraud detection?
4. What is the performance of the machine learning algorithms selected for mobile banking transaction fraud detection?
1.6 Significance
The significance of the study is the addition to the growing body of research on machine learning for mobile banking fraud detection in the Kenyan banking industry. The resulting product will aid in the development of automatic fraud detection systems without human intervention. This study demonstrates how to identify an algorithm that will reduce false positives and preserve the banks' customers.
1.7 Justification
An effective fraud detection system should accurately detect the transaction in real-time. There are two types of fraud detection systems that include anomaly detection and misuse detection. Anomaly detection systems detect intrusions into systems and uncover any outliers in the data by monitoring the system activities. Misuse detection systems detect attacks within the systems (Aghaei, 2017). This is achieved by defining normal behaviour within the system and then setting all other behaviour as abnormal, where it is flagged in real-time within the misuse detection systems.
Several approaches in machine learning (ML) have been implemented over the years. Typical ML algorithms used are KNN (K Nearest Neighbour), decision trees, and Logistic Regression. However, these are supervised methods, implying that they need to learn by labels to identify fraudulent transactions. When the company lacks this information, these algorithms are untrainable.
Given the increase in fraudulent activities within the banking sector, this research project must explore machine learning algorithms' effectiveness, including KNN (K Nearest Neighbour), Naïve Bayes and logistic regression over traditional rule-based fraud detection systems. The loopholes within the rule- based systems necessitate more automated fraud detection approaches.
1.8 Scope of Study
The analysis focused on Equity Bank Limited transactional data. The data consisted of data from the mobile banking system that included data from mobile banking transactions and data from the fraud detection system. The dataset extracted from the data warehouse for this research comprised 450,352 separate mobile banking transactions for February 2021. It is important to note that the prototype solution presented in the paper can be applied to other banks. This prototype is applicable where the type of data captured is similar to that used in creating the model.
Click “DOWNLOAD NOW” below to get the complete Projects
FOR QUICK HELP CHAT WITH US NOW!
+(234) 0814 780 1594
Login To Comment