ABSTRACT
Over the last decade, digital credit has been the fastest growing financial innovation in Kenya. This has largely been attributed to by technological innovations and mobile phone penetration enabling expanded access to financial services to individuals who were previously unbanked. Overall access to formal financial services now stands at 83%, up from 67% in 2016, and 88% of the adult population has access to a mobile money account (KFSD , 2019).
Precise credit risk assessment also known as loan default prediction is crucial to the functioning of lending institutions. Traditional credit score models are constructed with demographic characteristics, historical payment data, credit bureau data and application data. In online mobile based lending, borrower’s fraudulent risk is higher. Hence, credit risk models based on machine learning algorithms provide a higher level of accuracy in predicting default.
The main objective of this project is to predict loan default by applying machine learning algorithms. The proposed methodology involves data collection , data pre-processing , data analysis , model selection and performance evaluation . This project takes data of previous customers to whom on a set of parameters loan were approved. The machine learning model is then trained on that record to get accurate results. The main machine learning algorithms applied are logistic regressions, naïve bayes and decision trees. The performance of the machine learning models are then compared using performance metrics and the best machine learning algorithm is selected to predict the loan default.
TABLE OF CONTENTS
Declaration i
Abstract ii
Table of Contents iii
List of Figures vi
List of Tables vii
Chapter 1: Introduction
1.1 Background 1
1.2 Problem Statement 2
1.3 Significance of Study 2
1.4 Research Objectives 2
1.5 Justification 2
Chapter 2: Literature Review
2.1 Introduction 3
2.2 Traditional Credit Risk Assessment 3
2.2.1 Linear Regression 3
2.2.2 Discriminant Analysis 3
2.2.3 Probit Analysis and Logistic Regression 4
2.2.4 Judgment-Based Models 4
2.3 Machine Learning approaches in Credit Scoring 4
2.3.1 Decision Trees 5
2.3.2 Random Forest 5
2.3.3 Logistic Regression 5
2.3.4 Neural Network 6
2.3.5 Naïve Bayes 6
2.4 Related Research Work 7
2.4.1. Prediction For Loan Approval Using Machine Learning Algorithm 7
2.4.2 Loan Prediction Using Decision Tree and Random Forest 7
Chapter 3: Methodology
3.1 Introduction 8
3.2 Research Design 8
3.3 Data Collection 9
3.4 Conceptual Design 9
3.5 Proposed Model 10
3.5.1 Design Requirements 11
3.6 Data Preprocessing 12
3.6.1 Data Cleaning 13
3.6.2 Data Reduction 14
3.6.3 Feature Engineering 15
3.6.4 Exploratory Data Analysis 16
3.6.5 Converting Categorical Variables 20
3.6.6 Standard Scaler 20
3.6.7 Handling Outliers 21
3.6.8 Modelling 24
3.6.9 Model Testing 25
3.7 Performance Metrics 27
3.7.1 Confusion Matrix 27
3.7.2 Accuracy 27
3.7.3 Precision 28
3.7.4 Recall 28
3.7.5 Specificity 28
3.7.6 F1 Score 28
Chapter 4: Results and Discussions
4.1 Introduction 29
4.2 Results 29
4.2.1 Decision Tree 30
4.2.2 Logistic Regression 32
4.2.3 Naïve Bayes 34
4.3 Discussion 36
Chapter 5: Conclusions and recommendations
5.1 Introduction 38
5.2 Conclusions 38
5.3 Limitations of the Research 39
5.4 Recommendations and Future Work 39
References 40
Appendices 43
Appendix A: Gantt Chart 43
LIST OF FIGURES
Figure 2.1 Neural Networks 6
Figure 3.1 Research Design 8
Figure 3.2 CRISP-DM Methodology 9
Figure 3.3 Proposed Model 10
Figure 3.4 Importing Python Libraries 12
Figure 3.5 Train Data 12
Figure 3.6 Removing Missing Values 13
Figure 3.7 Preprocessed Data 14
Figure 3.8 Creating Target Variable 15
Figure 3.9 Univariate analysis 17
Figure 3.10 Gender vs Default 18
Figure 3.11 Education vs Default 18
Figure 3.12 New Credit Customer vs Default 19
Figure 3.13 Employment Status vs Default 19
Figure 3.14 Marital Status vs Default 19
Figure 3.15 Converting Categorical Variables 20
Figure 3.16 Scaling data 20
Figure 3.17 Normalized Income Total 22
Figure 3.18 Normalized Amount 23
Figure 3.19 Variable Declaration 24
Figure 3.20 Splitting Data 24
Figure 3.21 Test Data Preprocessing 25
Figure 3.22 Handling Outliers in Test Data 26
Figure 4.1 Decision Tree Results 31
Figure 4.2 Logistic Regression Results 33
Figure 4.3 Naïve Bayes Results 35
LIST OF TABLES
Table 3.1 Confusion Matrix 27
Table 4.1 Dataset 29
Table 4.2 Decision Tree Confusion matrix 30
Table 4.3 Decision Tree Performance 31
Table 4.4 Logistic Regression Confusion matrix 32
Table 4.5 Logistic Regression Performance 33
Table 4.7 Naive Bayes Confusion matrix 34
Table 4.8 Naïve Bayes Performance 35
Table 4.10 Confusion Matrix Comparison 36
Table 4.11 Performance Comparison 37
CHAPTER 1
INTRODUCTION
1.1 Background
Kenya has made tremendous progress in electricity connectivity , internet penetration and mobile network coverage. Mobile cellular subscriptions per 100 people has significantly grown from 0.4 in 2000 to 80.4 in 2016. (World Development Indicators , 2016). This rapid telecommunications and infrastructure development coupled with the global decline in cellphone prices has been harnessed by companies to provide value-added services such as digital credit.
Credit providers have traditionally required interaction between agents and clients, risk assessment based on previous financial history, and loans delivered into a bank account. This excluded those without a bank account or access to a bank branch and those with undocumented financial histories, This hurdle was readily overcome by digital credit, which refers to loans that are supplied and repaid digitally, generally using a cell phone. Digital credit is instant , loan decisions are automated based on a set of rules applied on available data and it is managed remotely. (CGAP, 2016).
Digital credit has evolved to incorporate a number of different business models. The first model is a bank and mobile network operator partnership such as M-Shwari by NCBA Bank and Safaricom. The second model is a non-bank lender and mobile network operator such as Kopa Cash by Jumo and Airtel Kenya. The third model is a bank utilizing mobile network operator channels such as MCo-op Cash by Co-operative bank that uses USSD. The fourth model is non-bank mobile internet applications which involves non-bank lenders disbursing loans through smartphone mobile application such as Branch and Tala.
Mobile lending platforms use predictive analytics like transaction history , call logs, text messages, contact lists , age , education, and income to arrive at a credit worthiness score and limit. When analyzing first-time borrowers, alternative digital data is especially significant, whereas repayment-based credit history becomes more important for subsequent loan applications. This research project aims to evaluate the application of machine learning technique to improve the predicted loan default rates.
1.2 Problem Statement
Financial mobile lending institutions use credit scoring models to evaluate potential loan default risks. These models generate a score that translates the likelihood of defaulting on a loan, making lending decisions easier. Developing credit scoring model is time consuming. These models are also fixed and do not easily evolve with changing customer behavior to predict default more accurately. Machine learning approaches can help enhance the accuracy of loan default prediction.
1.3 Significance of Study
Credit risk assessment is crucial to the success of lending institutions since customer credit risk affects profitability directly. Traditional procedures are inefficient and time-consuming. The goal of this study is to investigate the use of machine learning approaches in loan prediction that is more dynamic and adaptable to changing client data. These techniques will also provide higher accuracy in predicting loan default.
1.4 Research Objectives
i. To review the existing literature on techniques applied in loan default prediction.
ii. To design a machine learning model that can be forecast loan default.
iii. To train and test the machine learning model to predict loan default.
iv. To evaluate the performance of the model in predicting loan default.
1.5 Justification
The importance of credit risk evaluation has significantly increased with digital credit. Financial institutions have developed advanced systems to access the credit worthiness of their customers. The objective of this research is to explore the use of machine learning algorithms to predict loan default and improve the accuracy of default prediction.
Buyers has the right to create
dispute within seven (7) days of purchase for 100% refund request when
you experience issue with the file received.
Dispute can only be created when
you receive a corrupt file, a wrong file or irregularities in the table of
contents and content of the file you received.
ProjectShelve.com shall either
provide the appropriate file within 48hrs or
send refund excluding your bank transaction charges. Term and
Conditions are applied.
Buyers are expected to confirm
that the material you are paying for is available on our website
ProjectShelve.com and you have selected the right material, you have also gone
through the preliminary pages and it interests you before payment. DO NOT MAKE
BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.
In case of payment for a
material not available on ProjectShelve.com, the management of
ProjectShelve.com has the right to keep your money until you send a topic that
is available on our website within 48 hours.
You cannot change topic after
receiving material of the topic you ordered and paid for.
Login To Comment