Table of Contents
CHAPTER 1: INTRODUCTION
BACKGROUND 5
PROBLEM STATEMENT 6
OBJECTIVES 7
SIGNIFICANCE OF THE STUDY 8
ASSUMPTIONS 8
JUSTIFICATION 8
CHAPTER 2: LITERATURE REVIEW
a. Signature based detection. 10
b. Anomaly based detection. 10
c. Continuous System Health Monitoring 10
USER ENTITY BEHAVIOR ANALYTICS (UEBA) 10
MACHINE LEARNING TECHNIQUES. 11
Classification Techniques 12
Clustering Techniques 12
EXISTING SYSTEMS 12
Graylog 12
Kibana 13
CONCEPTUAL FRAMEWORK 14
RESEARCH GAP 14
CHAPTER 3: METHODOLOGY
INTRODUCTION 16
RESEARCH DESIGN 16
STUDY POPULATION AND SAMPLING METHODS 16
DATA COLLECTION 17
PRETESTING (VALIDITY AND RELIABILITY) 17
DATA ANALYSIS METHOD AND THE MODEL 17
A. Log ingestion 18
1. Message 18
2. Timestamp 18
3. Log level 18
Source information 18
B. Log Aggregation 19
C. Normalization 19
D. Correlation 19
E. Analysis 20
LSTM 20
ETHICAL CONSIDERATION 21
EXPECTED RESULTS/OUTPUTS 21
RESOURCES 21
CHAPTER 4: RESULT AND DISCUSSION
4.1 DESIGN AND ANALYSIS 22
Feature extraction 22
LSTM 23
4.2 IMPLEMENTATION 23
RESULTS 25
4.3 DISCUSSION AND CONCLUSION 28
Limitations 28
REFERENCES 29
CHAPTER 1
INTRODUCTION
BACKGROUND
The ever-increasing improvements in communication and network technologies have resulted in great results for organizations and our general lives. For example, great improvements in cloud infrastructures and distributed computing have eliminated existing geographical boundaries making it feasible for a lot to be achieved but this has also made it possible for cyber-attacks to originate from any part of the world. This has made intrusion detection a very difficult job in Cybersecurity since a wide range of security anomalies can be initiated.
Accordingly, cyber defence techniques must be i) increasingly intuitive, iii) more adjustable, and ii) vigorous to auto detect any threats and eliminate them. To meet these needs, corporations are using Artificial Intelligence methods to watch and tackle cyber-criminal activities (Wiafe et al., 2020). This highlights the growing importance of AI techniques in Cyber security.
Also based on recent research related to Cybersecurity, emails, and the internet browser activities are the most difficult to protect. According to these reports also, researchers have determined that almost half (49%) of all security incidents are caused by lack of end-user compliance (Arash et al., 2018). In the era of such arising security anomalies, having an intelligent pre-warning tool is key. One key gap in most existing security systems is that security teams in organizations usually focus on keeping their system secure without taking into account user experience of their end users who use the systems. Hence some users might not be able to uphold correctly the standards set by such security teams hence sometimes leaving loopholes that attackers might use to breach their security systems. One technique to overcome such attacks is by using intelligent behavior-based security systems.
Using behavior-based artificial intelligence to profile normal users behavior will help raise an alarm when security anomalies take place. User Entity Behavior Analysis, created by Gartner, is one such technology that uses network usage patterns of end users, and then applies machine learning algorithms to detect security threats from those learnt patterns. A research done by Digital Guardian, reveals that in 2020, organizations that don’t have such security automation tools will experience a higher cost, by $3.58 billion, than those with security automation. This is how expensive a data breach is. The proposed model will use a multimodal-based UEBA to create a security profile of end user patterns using Convolutional Neural Network (CNN), which will help detect any security anomalies.
In this research, we shall use log data from University of Nairobi’s (UoN’s) servers, and the United States Computer Emergency Readiness team (CERT) data set. We shall aggregate the log data from these different sources using Fluentd. Fluentd is a local log aggregator that gathers all node logs and forwards them to a centralized storage facility. One of its key advantages is that it has low memory requirements and has a high throughput hence reducing system utilization. We shall then normalize all the data in the centralized storage to a “normal form” to improve its data integrity and reduce any redundancy in the data. This will ensure all our data, in all the records, reads and appears the same way.
Once done with the data pre-processing, The first part of our analysis is log correlation, which is looking for patterns in log events that are not evident in the separate log files. This connects the dots on related yet heterogenous data. Once this has been done, we shall then feed our correlated data to our deep learning, CNN model. Our deep CNN model will be an automatic log classification system that uses deep learning methods to predict the log event category that the collected logs belong to and allocate a given score to each classification prediction. This will help detect any anomaly which is not according to the profiled user patterns learned by our model.
We shall then visualize this learnt data using D3 charts for easier monitoring of security events and also help to identify data anomalies in the network infrastructure more easily visually.
PROBLEM STATEMENT
Security threats are continually evolving which makes them nearly impractical to be identified using traditional cybersecurity controls. In a recent security breach report, it was determined that the average median time between intrusion of a cyber attack and it’s detection is about 14 days (Statista, 2021). Also, most cyber attacks take a few minutes with 68% of them going undiscovered and only 3% of them discovered as they are happening. This has led to an increased use of machine learning in security systems to help reveal cyber criminal patterns and to be able to reveal such activities as they happen. By 2019, the growing adoption of machine learning and advanced analytics helped reduce cyber security threats by 20% as seen below.
OBJECTIVES
Objectives of this research:
1. To investigate how logs from multiple devices and applications could be aggregated in one normalized centralized storage.
2. To determine user characteristics important in cyber security.
3. Tie the user characteristics to the collected logs and extract these characteristics from the logs.
4. To develop a prediction classification model that can profile different normal end-user’s usage patterns from the logs.
5. To evaluate the model and create dashboards for it.
Research Questions:
1. How can we aggregate logs for multiple sources to an aggregated normalized centralized data storage?
2. What are the common user characteristics that we can use to profile end users?
3. How can we retrieve user characteristics from logs and use them to create a security profile of them?
SIGNIFICANCE OF THE STUDY
This project will utilize deep learning techniques to create a behavior profile of each user, in particular LSTMs.
LSTM is a type of supervised machine-learning technique that is majorly applied in Natural language Processing and speech recognition. LSTMs are drilled to get the usual sequences then use the past to forecast the sequences of the next sequence state. The difference between the given prediction and the actual sequence is an proof of security threat identification in the system.
This study will give evidence that monitoring behavior of users and entities will enable us to detect most forms of traditional threats that cannot be detected using signature based techniques such as antiviruses and firewalls. Since we monitor how our users behave normally and any deviation from the normal behavior is flagged as an anomaly for the cyber security team to investigate further. This will help security threats to be discovered quickly on the go as they happen hence saving the loss that are associated with cyber security threats.
ASSUMPTIONS
1. Users tend to have a internet usage behavioral pattern
2. Users will use the same device to perform work/school work during the entire time of this research.
JUSTIFICATION
All the existing log analysis tools that use UEBA are commercial and their pricing is relatively high. Creating an open source UEBA tool will be a great addition to the open source community. Also, the ability to dynamically customize the model is lacking in all current existing UEBA tools. In this research, we shall make it possible for users/organizations to customize the model according to their organization structures which will make our tool more dynamic.
Buyers has the right to create
dispute within seven (7) days of purchase for 100% refund request when
you experience issue with the file received.
Dispute can only be created when
you receive a corrupt file, a wrong file or irregularities in the table of
contents and content of the file you received.
ProjectShelve.com shall either
provide the appropriate file within 48hrs or
send refund excluding your bank transaction charges. Term and
Conditions are applied.
Buyers are expected to confirm
that the material you are paying for is available on our website
ProjectShelve.com and you have selected the right material, you have also gone
through the preliminary pages and it interests you before payment. DO NOT MAKE
BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.
In case of payment for a
material not available on ProjectShelve.com, the management of
ProjectShelve.com has the right to keep your money until you send a topic that
is available on our website within 48 hours.
You cannot change topic after
receiving material of the topic you ordered and paid for.
Login To Comment