ABSTRACT
Internet is no doubt inevitable; it has a tremendous impact in our lives. Despite its importance, internet comes with many challenges, among which is security. From the literature, several attempts have been made to develop a secure and user-friendly spam detection technique. But these attempts linger between these two fundamental issues: the robustness and the usability. The passiveness of some Intrusion Detection Systems for example, the CAPTCHA, which failed to detect some forms of attacks with low usability, is of major concern to researchers in recent years. In this work, an enhanced intrusion detection framework named honey CAPTCHA capable of detecting web page crawlers, resilient and efficient to users is designed to solve the inefficient and weak security settings from the traditional IDS. The system is mainly considered as an option to the CAPTCHA-BASED IDS model which inherited the passiveness and inefficient problems that lingered between all the traditional IDPS. Eliminating CAPTCHA entirely on the system gateway, providing a cognitive CAPTCHA test, setting response time to the dummy page and a fallback opportunity are the mean ingredients that makes the proposal system successful. The proposed system outperforms the existing system considering its performance measure with a detection rate (DR) of 76%, which is 1.7 times the detection rate of the existing system and a false positive rate (FPR) of 10% as against that of the existing system with 36% FPR. This shows that our system is more robust compare to the existing system. The usability of proposed system measured using BDR and BNR is 1.5 times that of the existing system, this shows how efficient proposed system is to users when compared to the existing system. Both systems were also compared based on the standard IDS evaluation metrics CID, of which our system is 2.26 times the existing system. Indeed, honey CAPTCHA can be guaranteed to be a more secure and user friendly IDP system over the use of directs CAPTCHA.
Table of Contents
Cover page i
Fly leaf i
Title page iii
CERTIFICATION iv
DECLARATION v
ACKNOWLEDGEMENT vii
ABSTRACT viii
List of Figures xi
Definition of terms xv.
CHAPTER ONE
INTRODUCTION 1
1.1 Background of the Study 1
1.2 Problem Statement 5
1.3 Motivation 6
1.4 Aim and Objectives 6
1.5 Research Methodology 7
CHAPTER TWO 9
LITERATURE REVIEW 9
2.1 Introduction 9
2.2 Intrusion Detection System 9
2.3 CAPTCHA 11
2.4 Honeypot 14
2.5 Response Time 14
2.6 Negative Selection Algorithm 15
2.7 Naïve Bayes Classifier 15
2.9 Technologies used in Evaluation of the System 16
2.10 Related work 17
2.11 Gap in the Literature 26
CHAPTER THREE 28
3.1 Design Overview 28
3.2 Input Interface Design 29
3.3 Process Design 29
3.4 Designed Algorithm 29
3.5 The proposed system Framework 31
3.6 The Flow Chart of the proposed system 34
3.7 How the Proposed system works 35
3.8 Evaluation Metrics for IDS 36
CHAPTER FOUR 39
4.1 Introduction 39
4.2 Implementation Requirement. 39
4.3 System Overview 39
4.4 Evaluation 44
4.4 Result Discussion 50
CHAPTER FIVE 53
5.1 Summary 53
5.2 Conclusion 54
5. 3 Contribution to knowledge 54
5.4 Recommendations for Future Work 55
REFERENCES 56
List of Figures
Figure2.1: Classification of IDS (Sahasrabuddhe et al, 2017) ……………………………... 10
Figure 2.2: CAPTCHA Classification (Mohammad & Mohammad Reza, 2014) ……………… 13
Figure 2.3: A gateway for Souley and Abubakar (2018) System………………………………. 25
Figure 2.4: CAPTCHA-BASED IDS model framework (Souley & Abubakar, 2018)……….. 25
Figure 3.1: The proposed system Framework………………………………………………….. 33
Figure 3.2: Flow Chart of the proposed system………………………………………………… 33
Figure 4.1: Login page of the system…………………………………………………………… 34
Figure 4.2: Login page of the system with the hidden field displayed…………………………. 40
Figure 4.3: honeyCAPTCHA that receive the redirected bots………………………………….. 40
Fig 4.4 Dummy page that receive intelligent bots…………………………………………….... 40
Figure 4.5: A Snapshot of a genuine human IP addresses……………………………………… 41
Figure 4.6: A Snapshot of an Attempt bots IP addresses……………………………………….. 42
Figure 4.7: A Snapshot of a non-intelligent bots IP addresses………………………………….. 43
Figure 4.8: A Snapshot of an intelligent bots IP addresses……………………………………... 44
Figure 4.9: The Graphical representation of 192 visitors of the proposed system…………….... 45
Figure 4.10: The Snapshot of the Analysis of the existing system performance in RStudio…… 46
Figure 4.11: The snapshot of the analysis of the proposed system in RStudio………………… 46
Figure 4.12: Graphical representation of the system performance w.r.t to DR, FPR and TPR… 47
Figure 4.13: Graphical representation of the robustness based on Precision, Recall, F-measure and Accuracy……………………………………………………………………… 48
Figure 4.14: Graphical representations of the BDR of the two systems……………... 49
List of Tables
Table: 2. 1: The confusion metrics table ……………………………………………. 36
Table 4.1: The entire visitors of the proposed system categorized in month…………. 45
Table 4.2: Evaluating of robustness of the proposed system and that of Souley and Abubakar (2018)……………………………………………………………. 47
Table 4.3: Evaluate the robustness based on Precision, Recall, F-Measure and Accuracy of the two systems…………………………………………………………. 48
Table 4.4: The usability comparative analysis of the two systems…………………… 49
Abbreviations
AIS Artificial Immune System
AUC Area Under a Curve
BDR Bayesian Detection Rate
CAPTCHA Completely Automated Turing Test to Tell Computer and Human Apart
Caret Classification And Regression Training
CID Intrusion Detection Capability
CR Classification Rate
DR Detection Rate
e1071 functions for latent class analysis
FB-NSA Boundary-fixed NSA
FM F-Measure
FN False Negative
FP False Positive
FPR False Positive Rate
HIDS Hybrid Intrusion Detection System
honeyCAPTCHA Honeypot + CAPTCHA techniques
HTML5 Hypertext Markup Languauge version 5
CSS Cascading Style Sheet
IDPs Intrusion Detection and Prevention System
IDS Intrusion Detection System
IP Internet Protocol
IPS Intrusion Prevention System
IT Information Technology
MIS Mammalian Immune System
NIDS Network Intrusion Detection System
NSA Negative Selection Algorithm
OALFB-NSA Online Adaptive Learning Boundary-Fixed NSA
OCR Optical Character Recognition
PHP Hypertext Preprocessor
PIN Personal Identification Number
TN True Negative
TP True Positive
Definition of terms
Artificial Immune System (AIS): can be defined as a computational system inspired by the principles and processes of the Mammalian Immune System (MIS).
Bayesian Detection Rate (BDR): a Bayesian representation of Positive predictive Value (PPV), which is the probability of an intrusion when the IDS outputs an alarm.
CAPTCHA: distinguishes the human users and machines by giving a simple test to the human of which the machine might not be able to solve.
Caret : Is a package (short for Classification and Regression Training) which set a function that attempt to streamline the process for creating predictive models.
Detection Rate (DR): The ratio between the number of correctly detected attacks and the number of the attacks.
e1071: is a function for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier
False Positive Rate (FPR: It is the ratio between the number of normal instances detected as attack and the total number of normal instances
honeyCAPTCHA: Is an enhanced Intrusion detection framework that make a genuine user access to a system easy, living bots with a cognitive CAPTCHA, where the solution to the CAPTCHA is fruitless.
Mammalian Immune System (MIS): Is a collection of organs; tissues, cells and enzymes all united under one goal; to protect the mammal body.
Negative Selection Algorithm: Is one of the main algorithms in AIS inspired from the negative selection of an adaptive immune system
OCR: Optical Character Recognition of human as a means of testing to enable access to a given resources.
IDPS: Intrusion Detection and Prevention are two broad terms describing application security practices used to mitigate attacks and block new threats.
CHAPTER ONE
INTRODUCTION
1.1 Background of the Study
Computer security is a field in IT that focuses on the protection of both computer hardware and software resource. The Internet as a major tool in IT needs to be secured for its enormous impact on our daily life. Hence, security on the internet is now becoming an appealing area of research. There is no doubt that Artificial Intelligence has come with a huge progress in our strive for technological advancement, although with the abuse of the Artificial Intelligence technology, it has now become a monster that stands to be a threat to security. Cyber-attack in the past was of concern to office holders and government, but today cyber-attack is of general concern to all as it can trigger war and political instability (Joseph, 2018). There is no defined feature that will qualify anyone to fall a victim of cyber-crime apart from being on the internet. Legitimate users of the internet can be attacked by web-bots in many ways some of which include: social engineering, malvertising, ransomware, phishing and spy phishing, malware, sql injection. While attacking users, bots cause severe harm to victims ranging from loss of victim‟s files, losing computer control, hardware destruction and possibly the victim‟s life. About 4.5 million identities were stolen in 2017 approximately more than the internet users. Cyber-criminals will continuously target identities and steal credential of internet users in 2018 (Joseph, 2018).
Attacks, cyber-crime, intruders, fraudsters as they may be called are generally regarded as internet threats which consist of all act of anomaly against computer security. Some of the known threats as listed by (Public Safety Canada, 2017) are as follows:
i. Botnets, which refers to the collection of software bots that creates an army of infected computers (known as zombies) that are controlled by the originator.
ii. The distributed denial of service (DDoS) attack, which refers to the flooding of a network with useless information by the zombie computers purposely to sabotage the website.
iii. Hacking, a process of getting an unauthorized access to a computer.
iv. Malware, a malicious software that infects victim‟s computer as a scareware to intimidate the victim, reformat his/her hardware leading to loss of files or stealing of information.
v. Pharming, a fraud that redirects a legitimate user to an illegitimate website on visiting a legitimate URL that was spoofed hence, luring victims to give out his/her vital information to scammers.
vi. Phishing are fake emails, text messages or websites sent by scammers that will lure victims to give out his /her information to them.
vii. Ransomware are types of malware that restrict victim‟s access to his/her computer or a vital file inside the system, with a display message of payment before the restriction will be removed.
viii. Spam are annoying junk emails that create a burden to communication service providers and businesses to filter electronic messages which can be used to phish a victim‟s information without his consent
ix. Spoofing are fake emails and websites that are created and look similar to real ones which are used in conjunction with phishing techniques to exploit victim's information.
x. Spamware or Adware are software products that collect a victim's information without his consent. It comes in the form of free download and is installed automatically with or without your consent.
xi. Trojan horse is an embedded software or disguise on legitimate software. It is an executable file that will install itself and run automatically once it's downloaded. Trojan horse can delete victim's files, use a victim's computer to hack other computers, watch a victim's through his/her web cam, log a victim's keystrokes (such as a credit card number entered in an online purchase), record usernames, passwords and other personal information.
xii. Viruses: Malicious computer programs that are often sent as an email attachment or a download with the intent of infecting a victim's computer, as well as the computers of everyone in the victim's contact list. Just visiting a site can start an automatic download of a virus.
xiii. Worms are a common threat to computers and the Internet as a whole. A worm, unlike a virus, goes to work on its own without attaching itself to files or programs. It lives in a victim's computer memory, doesn't damage or alter the hard drive and propagates by sending itself to other computers in a network – whether within a company or the Internet itself.
xiv. Wpa2 handshake vulnerabilities: The Key reinstallation attack (or Krack) vulnerability allows a malicious actor to read encrypted network traffic on a Wi-Fi Protected Access II (WPA2) router and sends traffic back to the network. Krack can affect both personal (home users and small businesses) and enterprise networks.
Measures or techniques used to protect computer and internet resources against those attacks consist of:
i. Authentication: ensures that users and computers are who they claim to be by establishing proof of identity. This is usually accomplished based on one or a combination of something you are (a biometric e.g., such characteristics as a voice pattern, handwriting or a fingerprint), something you know (a secret e.g. a password, Personal Identification Number (PIN) or cryptographic key) or something you have (a token e.g. a credit card or a smart card).
ii. Encryption is the process of encoding a message or information in such a way that only authorized parties can access it and those who are not authorized cannot. Encryption does not itself prevent interference, but denies the intelligible content to a would-be interceptor.
iii. Firewall is a network security system that monitors and controls over all your incoming and outgoing network traffic based on advanced and a defined set of security rules.
iv. Intrusion Detection System (IDS) is a process used to identify intrusions. Intrusion detection techniques have been traditionally classified into one of two methodologies: anomaly detection or misuse detection.
v. Intrusion Prevention System is an advanced IDS that is capable of preventing a known detective intrusion. One of the prominent IPS is CAPTCHA
Other techniques that are used as a compliment or even substitute of the re-known techniques listed above as described by Muhammad and MuhammadReza (2015) who categorizes them into three as: Interactive methods, Administrative method and Cheating bots.
i. Interactive method provides mechanism to distinguish between human users and bots in the form of request to do an action that bots could not usually perform such as motion (mouse) based operations or presenting human only assets like social security number, sms/email verification etc.
ii. Administrative method involves high level supervision task such as detecting spam content or preventing bots from attack. The use of third-party services for content analysis and filtering is one of the reliable tools used by administrators to detect spammers. Techniques used under these categories include: centralized sign-on, Limiting account, Logging, Server-side validation, Response time etc.
iii. Cheating bots is reverse techniques that identify bots by attracting them to do an action that human users could not do. Some techniques under these categories consist of honeypot, switching form fields and confirmation pages
1.2 Problem Statement
Despite effort by researchers to stabilize the tradeoff between security and usability in IDPs, problems in developing a unique user friendly and a secure framework for detection and prevention of all spambots still persist. An attempt by Souley and Abubakar (2018) to develop a CAPTCHA-BASED IDS left challenges that include:
i. Presenting CAPTCHA in the user interface. CAPTCHA is dreaded that people are frustrated by seeing it (Michael, 2017).
ii. The instruction: “do not type anything in this textbox” in Souley and Abubakar (2018) seems ambiguous to daily CAPTCHA users. Hence human may neglect it and fall a victim.
iii. Bots that skipped CAPTCHA test can easily gain access to the system.
iv. Their system only considers retrieving intelligent bots i.e. those capable of breaking the CAPTCHA neglecting those bots that did not attempt the CAPTCHA solution which are also threat to our system security.
v. More so their system has no positive fallback for genuine users to prove wrong the decision of the system by miss-threatening.
Our work will focus on providing a solution to these problems by:
i. Eliminating CAPTCHA in human interface, and replacing it with a decoy field that will detect and redirect all bots to a honeyCAPTCHA, as a means of observing their behavior. Hence our CAPTCHA will only be visible to the bots.
ii. The instructions will be eliminated to avoid confusion by users.
iii. All bots IP addresses will be captured and separated as intelligent and non-intelligent through their attempt toward solving the deceitful CAPTCHA.
iv. Use response time technique and Classical Negative selection algorithm to provide a fallback opportunity for the misclassified humans
1.3 Motivation
This research is motivated by the lingering tradeoff between security and usability (Mohammad & MohammadReza, 2015) of an Intrusion Detection and Prevention Systems (IDPS) turning the situation to a cat and mouse game. Hackers are getting faster whilst defenders are treading water. Cybercriminals are way ahead of the game against defenders without having to try anything new according to the latest edition of Verizon's benchmark survey of security breaches (John, 2016). Honeypots techniques were mainly employed for studying new threat due to the advancement of artificial intelligence. Using honeypot alone makes most systems vulnerable to attacks. Researchers have proven that the combination of honeypot and other known techniques like firewall, CAPTCHAs and other IDS will enable robust security and improve efficiency of the IDPS systems (Parita et al., 2016). Most of the designed IDPS systems left this problem unsolved.
1.4 Aim and Objectives
This research aims at designing an enhanced intrusion detection framework by improving the CAPTCHA BASED IDS.
The objectives of the research are to:
i. Develop honeyCAPTCHA, an enhanced intrusion detection framework
ii. Implement the developed framework
iii. Evaluate the developed framework
1.5 Research Methodology
a) Designing honeyCAPTCHA intrusion detection framework which consist of:
i. Login form with a decoy field hidden using JavaScript to detect bots.
ii. Cognitive CAPTCHA that will receive the detected bots.
iii. Dummy page that will receive intelligent bots that pass the CAPTCHA test
a. Setting a time limit of 5 seconds to fill a Compose mail form in a dummy page.
b. Creating a fallback opportunity for genuine users that were classified as bots using a Classical Negative Selection Algorithm considering the time set limits in the Compose mail form.
b) Implementation consist of:
i. Hosting the framework on a similar site used by Souley and Abubakar (2018) to justify the data collection.
ii. Retrieve all the visitors IP address, with IP address retrieval algorithm in Souley and Abubakar (2018).
c) Evaluation of the robustness and usability based on the visitors IP addresses collected from the two systems using the following metrics:
i. Robustness: measured by False Positive Rate (FPR), Detection Rate (DR)
ii. Usability: measured with Bayesian Detection Rates (BDR) (Gu et al., 2006) using caret and e1071 packages in R languages on.
Login To Comment