ABSTRACT
Financial statements fraud detection techniques have been classified into various categories and this study will focus on one of them: artificial and computational intelligence techniques. One of the major challenges facing financial statements fraud detection is that financial data needed to train detection models, is hugely unavailable due to regulations that prohibit the transmission and distribution of the highly confidential data.
The aim of this research is to come up with a fraud detection technique that overcomes the challenge faced in fraud detection of unavailability of financial data. This research was conducted through first finding out which features of financial statements are key to financial statements fraud detection. An experiment was also done to find out which hybrid algorithm performs best at detecting fraud in financial statements. The features standing out from the first experiment would form the feature set, and a model built on the algorithm that performs best. The results showed that 3 of the top features were related to the assets of the business and out of all the 20 features identified, ones dealing with assets were 7 in total. Ensemble methods showed great accuracy when it comes to classification tasks that have a high dimensionality as all the methods scored 80% and the best performing being Reptile at 87.86%. The model built on reptile algorithm and trained using the identified feature set had an accuracy of 86.33%.
The key limitation of this research is the inaccessibility of financial statements data in the public domain. It is even harder to find these statements where fraud has occurred as efforts will have been put into place to conceal the presence of such fraud.
The project concludes that meta learning algorithms are performing better than other algorithms where the data to train is limited as is in this case, and that feature selection is important to increasing the accuracy of the model.
This study contributes to the knowledge of how to accurately detect fraud occurrence in financial statements by providing an insight as to which features of these statements are more key in indicating the possibility of fraud. The research also shows how key meta learning algorithms are in this new technological era, this being backed up by the accuracy of the two meta learning algorithms as they emerged the top two from all the hybrid algorithms identified.
Key words:
Reptile, financial statements fraud, fraud detection, meta learning, neural networks.
Table of Contents
Declaration 2
ABSTRACT 3
DEDICATION 4
ACKNOWLEDGMENT 5
List of Figures 6
List of Tables 7
Definition of Terms 8
CHAPTER 1. INTRODUCTION
1.1 Background 12
1.2 Problem Statement 13
1.3 Objectives 14
1.3.1 Overall Objective 14
1.3.2 Research Questions 14
1.3.3 Specific Objectives 14
1.4 Problem Justification 15
1.5 Significance 15
1.6 Scope of the study 15
CHAPTER 2. LITERATURE REVIEW
2.1 Introduction 16
2.2 Financial Statements 16
2.2.1 Balance Sheets 17
2.2.2 Income Statement 18
2.2.3 Cash Flow Statement 18
2.3 Current state of financial statements fraud detection 19
2.4 Previous work done 20
2.5 Gap 22
2.6 Ensemble Methods of Fraud Detection 23
2.7 Meta learning 24
2.7.1 Reptile 24
2.8 Link with existing work 25
2.9 Conceptual Framework 26
CHAPTER 3. RESEARCH METHODOLOGY
3.1 Introduction 26
3.2 Research design 27
3.2.1 Identify key features that aid in accurate financial fraud detection 27
3.2.2 Identify a meta learning algorithm to detect financial statements fraud 31
3.2.3 Train a meta learning model to detect financial statements and test the model’s accuracy 32
CHAPTER 4. RESULTS AND DISCUSSION
4.1 Introduction 37
4.2 Data Analysis 37
4.2.1 Target class distribution 37
4.2.2 Key Features Distribution 38
4.2.3 Key Features Correlation 39
4.3 Key Financial Statements Features 40
4.4 Algorithms Evaluation 45
4.5 Model Evaluation 47
CHAPTER 5. Conclusion and Recommendations
5.1 Achievements 49
5.2 Contributions 50
5.3 Challenges 51
5.4 Recommendations and future work 51
References 52
APPENDIX 54
APPENDIX 1: Source Code for Reptile Algorithm 54
List of Figures
FIGURE 1 BALANCE SHEET 17
FIGURE 2 INCOME STATEMENT 18
FIGURE 3 CASH FLOW STATEMENT 19
FIGURE 4 BATCHED VERSION OF REPTILE ALGORITHM 25
FIGURE 5 CONCEPTUAL FRAMEWORK 26
FIGURE 6 WEIGHT OF EVIDENCE ALGORITHM 30
FIGURE 7 PROTOTYPE DEVELOPMENT CYCLE 34
FIGURE 8 MODEL STRUCTURE 35
FIGURE 9 FRAUD TARGET CLASS DISTRIBUTION 38
FIGURE 10 RAW FINANCIAL FEATURES DISTRIBUTION 39
FIGURE 11 KEY FEATURES CORRELATION 40
FIGURE 12 FEATURES INFORMATION VALUE SCORES 42
FIGURE 13 1990 - 1994 FEATURE’S INFORMATION VALUES 43
FIGURE 14 1995 - 1999 FEATURES’ INFORMATION VALUES 43
FIGURE 15 2000 - 2004 FEATURES' INFORMATION VALUES 44
FIGURE 16 2005 - 2009 FEATURES' INFORMATION VALUES 44
FIGURE 17 2010 - 2014 FEATURES' INFORMATION VALUES 45
FIGURE 18 ALGORITHMS' EVALUATION SCORES 46
FIGURE 19 REPTILE MODEL PERFORMANCE 47
List of Tables
TABLE 1 INFORMATION VALUE 30
TABLE 2 FRAUD TARGET CLASS DISTRIBUTION 37
TABLE 3 FEATURES' INFORMATION VALUE SCORES 40
TABLE 4 ALGORITHMS' PERFORMANCE 46
TABLE 5 MODEL PERFORMANCE 47
Definition of Terms
Financial Statements – companies’ basic documents to reflect their financial status. Financial Fraud – an intentionally deceitful action designed to provide the perpetrator with unlawful gain.
Meta learning - designing new models that can learn new skills or adapt to new environments fast and with a few training examples.
Business Intelligence – combines business analytics, data mining, data visualization, data tools and infrastructure and best practices to help organizations to make more data-driven decisions. Probabilistic Neural Network - this is a feedforward neural network that is widely used in classification and pattern recognition problems.
Depreciation index – is used to judge whether companies are depreciating assets faster or slower.
Wc Accruals – this is the year over year difference in net current operating assets.
RSST Accruals – this measure shows changes in long-term operating assets and long-term operating liabilities.
Book to market – this is a ratio that compares a company’s book value to its market value.
Actual issuance – these are a set of securities that a company or government offers for sale.
CHAPTER 1
INTRODUCTION
1.1 Background
Fraud is a willing act of using deceit to provide the perpetrator with gain unlawfully. It can occur in finance, investment, and insurance (Chen, 2020). Common fraudulent schemes are identity theft, income, or asset falsification.
Financial statements are documents that show the financial status of a company. Some reasons behind financial statements fraud are to make the business appear more profitable, to show improvement in performance or to reduce tax obligations (I.SADGALI 2019).
Financial statements fraud detection techniques have been classified into categories such as descriptive/unsupervised techniques which focus on relations and interconnectedness, predictive techniques which predict target objects such as fraud occurrence and artificial and computational intelligence techniques (I.SADGALI 2019).
In the Big Data era, detecting fraud has proven to be a challenge and one approach that has been increasingly used to analyze relations and connectivity patterns is Graph-based anomaly detection (GBAD) (Pourhabibi, et al. 2020). Research was conducted which analyzed studies published from 2007 to 2018 and showed a growing trend of GBAD techniques for fraud detection (Pourhabibi, et al. 2020). The research showed that approximately 87.2% of the reviewed research have exclusively developed their models using unsupervised learning techniques. This was driven by the fact that data labels are often in short supply or even nonexistent in fraud detection.
Most studies appreciate the fact that prior research on fraud detection has faced the challenges of accessing internal data (auditor-client relationships, personal and behavioral characteristics) since this data is not readily available to investors, auditors, and regulators (Ahmed Abbasi 2012). Privacy concerns have led to stakeholders and organizations being reluctant to share their fraud information. The direct impact of this is that there is a hinderance of research and this affects the integrity of the experiments conducted (Pourhabibi, et al. 2020).
It is apparent that fraud detection is faced by the challenge of not having enough data to improve with. This leads to the need for a fraud detection technique that can work and self-improve with little data. A technique that is standing out in this aspect is meta learning.
Meta learning aims to design new models that are capable of adapting to new environments fast and learning new skills quickly as well. A good meta learning model can adapt to new tasks and environments with limited exposure to new task configurations (Weng 2018). The usual approaches to meta learning are model-based, metric-based, and optimization-based.
A meta learning framework called Meta fraud was proposed to address research gaps that exist due to unavailability of data.
This research shall aim to increase the accuracy of fraud detection using meta learning to overcome the challenge that modern fraud detection techniques are facing, that is unavailability of financial data due to its sensitive nature.
Many fraud schemes have led to losses that span several years and this has shown that existing fraud detection and detection mechanisms are ineffective. Enhanced financial fraud detection capabilities will also greatly benefit the stakeholders, these being: audit firms, investors, and government regulators (Abbasi 2012).
Financial fraud has been shown to cause serious consequences for organizations in terms of their long-term sustainability as well as affect the employees and the economy as well. Research indicates that not only is the risk of experiencing fraud high, but it is also increasing (Abbasi 2012).
1.2 Problem Statement
Fraudsters have been adapting over time and in doing so, they invent new ways of beating fraud detection systems. By doing this, financial fraud continues to grow (I.SADGALI 2019).
Fraud in financial statements is difficult to detect, and even after detection, the damage already inflicted is serious (Yang, et al. 2020). Therefore, efficient, and effective measures to detect financial statements fraud would offer important value to regulators and other stakeholders.
One of the major challenges facing fraud detection is that financial data needed to train detection models, is hugely unavailable due to laws that prevent the highly confidential financial data to be transmitted or distributed (Pourhabibi, et al. 2020).
The purpose of this research is to find a way to overcome this challenge by improving on the current meta learning based fraud detection techniques that are currently being used.
1.3 Objectives
1.3.1 Overall Objective
The aim of this research is to come up with a fraud detection technique that overcomes the challenge faced in fraud detection of unavailability of financial data due to its confidential nature.
1.3.2 Research Questions
More specifically, the research questions to be addressed are:
1. How do the different features of financial statements affect the accuracy in detecting financial fraud?
2. What is a modern classification algorithm that can be used to identify financial fraud?
3. How can these key features be used to create a prototype that is more accurate in detecting financial fraud?
4. How effective is the model created in detecting financial fraud?
1.3.3 Specific Objectives
The research has the following sub-objectives:
1. Identify key features that aid in accurate financial fraud detection.
2. Identify a classification algorithm to detect financial statements fraud.
3. Train a meta learning model to detect financial statements fraud using the features identified.
4. Test the model’s accuracy in detecting financial fraud.
1.4 Problem Justification
It has emerged that the chances and risk of fraud increases more during periods of recession (Bănărescu, 2015), such as the ones currently being experienced due to the covid-19 pandemic. It therefore becomes important to organizations to implement a series of anti-fraud techniques.
1.5 Significance
The expected outcome of this research is that the proposed fraud detection method will be more accurate in detecting the presence of fraud in financial statements. This will facilitate stakeholders such as auditors and government agencies to detect fraud better.
The contributions to society will be that the funds that are lost to financial fraud can then be used more productively by organizations, and their growth is beneficial to communities in that there are more employment opportunities and better standards of living for the people.
1.6 Scope of the study
This research limits its review of previous financial fraud detection techniques to those that used data that was publicly available. This is due to the difficulty in obtaining companies financial data mostly where fraud is involved.
This research only focuses on income statement, balance sheets and statement of cash flow financial statements.
Login To Comment