ABSTRACT
The urgency and timely requirements of cybersecurity briefings poses a challenge to a few cybersecurity professionals who have to read and summarize vast amount of cybersecurity reports from several sources (personal communication, October 26, 2021). This paper demonstrates a solution based on Long Short Term Memory that automates the process of generating briefs from various cybersecurity report sources and further assesses the standardly used metric(ROUGE) for summary evaluation. This was achieved through the use of CRISP-DM methodology and application of the natural language processing techniques. After training and testing the model, it outperformed other summarizers such as lexRank. Abstractive technique is considered to be relatively strong and dynamic, because sentences that form summaries are generated based on their semantic meaning. On assessing various ROUGE variants, it was clear that evaluating specific summaries require different ROUGE metrics. For instance, ROUGE-1 and ROUGE-2 may be useful if you're working on extractive summarization.
Keywords: Cybersecurity Briefing, Recurrent Neural Network, Long Short Term Memory, Abstractive Summary, Extractive Summary, ROUGE, ROUGE-AR.
TABLE OF CONTENTS
Abstract ii
List of Figures iv
List of Tables v
List of Abbreviations vi
CHAPTER ONE
INTRODUCTION
1.1 Background Study 1
1.2 Problem of Research 2
1.3 Problem Definition 2
1.4 Research Questions 2
1.5 Objectives 3
1.6 Scope 3
1.7 Significance 3
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction 4
2.2 Related Work 4
2.3 Research Gap 7
2.4 System Design Architecture 8
CHAPTER THREE
METHODOLOGY
3.1 Introduction 9
3.2 Methodology Overview 9
3.3 Approach 10
CHAPTER FOUR
RESULTS AND DISCUSSIONS
4.1. Introduction 22
4.2. Results 22
4.2.1 Model Output 22
4.2.2 Discussion on the results and accuracy of the model 26
4.3. Achievements 30
4.4 Limitations 30
CHAPTER FIVE
Conclusion
4.6. Future Work 31
4.7Acknowledgment 31
5 REFERENCES 32
Appendices 34
Appendix 1: Code Snippets 34
Appendix 2: User Interface 40
Appendix 3: Questionnaire to Gather Cybersecurity Analysts’ Views on the Quality of Model Generated Briefs 43
LIST OF FIGURES
Figure 1:Automated Cybersecurity Briefing Design 8
Figure 2:CRISP-DM Methodology. Source: (Huber et al., 2019) 10
Figure 3: Plot of Data length distribution 12
Figure 4: LSTM for both encoding and decoding. Source (Samurainote, 2019) 15
Figure 5: LSTM Cell State Operation. Source (Pluralsight, 2020) 16
Figure 6: Modeling Point of Optimal results 17
Figure 7: The Long Short Term Memory (LSTM), for encoding during training phase. Source (Samurainote, 2019) 18
Figure 8: The Long Short Term Memory (LSTM), for decoding during inference phase. Source (Samurainote, 2019) 18
Figure 9: Brief generation from a lengthy text 22
Figure 10:Brief generation from files/reports 23
Figure 11:Brief generation from a web blog 24
Figure 12:Measuring Model performance 25
Figure 13: Validation of the briefs generated by the industry experts 29
Figure 14: Validation of the briefs generated by the industry experts 29
Figure 15: User interface for generating a brief from lengthy text 40
Figure 16: Generating a brief from website URLs 41
Figure 17:Comparing the abstractive and extractive summaries 42
Figure 18: Survey questionnaire to collect cybersecurity analyst views on the quality of the model generated brief 43
Figure 19: Question one to gather analyst views on the importance of automated cyber briefing 43
Figure 20:Question two aimed at gauging the impact of automated cyber briefing on strategic decision making process 44 Figure 21:Question three aimed at gathering cyber analsyts’ opinion on the quality of one of the generated cybersecurity brief 45
Figure 22: Question five aimed at gauging the quality of one of the cybersecurity briefs by cyber analysts 46
LIST OF TABLES
Table 1:Dataset format used for training and testing the model 11
Table 2: Unclean and cleaned data 14
Table 3:Resources Utilized 21
Table 4: Accuracy of the abstractive based trained Model based on the generated Summary Y(Automated Cybersecurity Briefing Model, method 3) 26
Table 5: Accuracy of Summary Y evaluation Using Method 2 (extractive based) 27
Table 6:Comparing the accuracy of the cybersecurity briefing model with other existing related solutions (method 1) 28
LIST OF ABBREVIATIONS
ROUGE - Recall-Oriented Understudy for Gisting Evaluation RNN - Recurrent Neural Network
LSTM - Long Short Term Memory
CRISP-DM - Cross-Industry Standard Process for Data Mining ACB - Automated Cybersecurity Briefing
ICT - Information Communication Technology
ROUGE-AR- Recall Oriented Understudy Gisting Evaluation - Anaphor Resolution GOOGLE COLAB - Google Colaboratory
CHAPTER ONE
INTRODUCTION
1.1 Background Study
In today’s соmрuterized world, there is the emergence of new cyber threats and risks every minute, amounting to several threats and risks within a single day that are propagated within the global cyberspace. The growing number of devices connecting to the internet widens the cyber threat landscape and thereby by increasing the chances of successful attacks. Quick responses that are supported by cybersecurity briefs that aid in making effective and well informed strategic decisions are needed (personal communication, August 10, 2021). Сyberсrime has now become a big business risk for both organizations and Nation states globally. The need for automated summary generation through text summаrizаtiоn in industries such as security is becoming inevitable due to voluminous nature/amount of reports received on a daily basis that require briefing.
The cybersecurity information that resides in many online sources such as cybersecurity vendor bulletins, peer forum posts, cyber threat information sharing platforms, cybersecurity blogs and various databases forms a significant portion of information sources for cybersecurity analysis. Moreover, Cybersecurity security аnаlysts depend on these dосuments fоr understanding their аssets’ vulnerabilities, рriоritizing раtсhes, trасing сlues during forensic efforts, and understanding emerging threats(Bridges et al., 2017).
The cybersecurity skills gар in Kenya has delayed the efforts of adequately addressing cyber-attacks. The few cybersecurity professionals available are sometimes overworked and do not find enough time to concentrate on key areas when addressing the cybersecurity challenges. A survey conducted by Tripwire early 2020 revealed that 83% of cybersecurity professionals felt overworked(Survey: Only 39% of Orgs Have Ability to Retain Cyber Security Talent, n.d.). Cybersecurity reporting is time bound and can only be effective if a report is submitted on time. A cybersecurity statistics report released on March 2021 by Forbes and Purplesec revealed that 230,000 new malware samples, 100,000 malicious websites and 10, 000 new malicious files are produced daily.
Most of the technological/ICT solutions affected are shared globally and therefore every cybersecurity professional from any part of the world should be concerned about any cyber threat being propagated in the global cyberspace. As a result, it is nearly impossible for these professionals to manually go through these hundreds and thousands of reports in order to gain insights and come up with briefs that can necessitate the next cause of action.
A quick and effective response to cyber-incidents should include automation of repetitive tasks such as report summarization to generate cyber threat briefs, that support and guide in prompt and quick decision making process. Autonomous cybersecurity report summarization takes the burden off the security team, so that they have enough time to focus more on addressing cyber-attacks.
1.2 Problem of Research
(i) Voluminous manual data correlation. It’s not convenient and practical to manually summarize thousands of cybersecurity reports generated daily in order to come up with an effective and actionable brief within the limited time.
(ii) Delayed Cybersecurity Response Time. Taking away the burden of manually performing repetitive tasks and minimize the Remediation Mean Time through human machine collaboration that leads to improved productivity, increased capacity and reduced risk.
(iii) Shortage of Cybersecurity skill gap. Enhance Cyber defense efficiency by saving manpower as this allows faster prevention of new and unknown threats through automation of repetitive tasks.
1.3 Problem Definition
The increasing lengthy and voluminous cybersecurity reports produced daily is becoming harder to generate meaningful and timely cybersecurity briefs(Dr. Emily Hand, n.d.).The few cybersecurity professional working in this industry are overwhelmed with the number of reports they have to sift through on a daily basis so as to identify a malicious activity or a potential cyber threat activity that can impact their organization or state. As a result, the intention of this research was to make this more proactive and easier by automating the process of cybersecurity briefing through the application of Natural language processing techniques.
By automating this repetitive process, the cybersecurity professional will have free time to pay more attention to other compelling tasks(emergencies) such as handling cyber incidents. Scenarios such as these(emergencies), the time required to prepare cybersecurity briefs for strategic decision making is limited.
1.4 Research Questions
(i) How do you establish a baseline for evaluating automatic text summarization process?
(ii) Can any ROUGE variants be used to evaluate both abstractive and extractive summaries?
(iii) Is ROUGE the best method for evaluating highly paraphrased summaries?
(iv) Is LSTM better than lexRank in automated cybersecurity briefing through text summarization process?
1.5 Objectives
1.5.1 Overall goal
To automate cybersecurity brief generation in order to support time bound strategic decision making process.
1.5.2 Specific research Objectives
(i) To apply natural language processing techniques in generating automated cybersecurity briefs in order to speed up strategic decision making process.
(ii) To examine the most effective ways of automating and improving cybersecurity text summarization.
(iii) To assess the variants of the standardly used summarization evaluation metrics and come up with deductions based on the assessment.
1.6 Scope
This research focuses on summarization of reports within the cybersecurity industry. The cybersecurity reports source coverage includes: cybersecurity vendor bulletins, peer forum posts, cyber threat information sharing platforms, cybersecurity analytical reports.
1.7 Significance
This paper is relevant to Smart Africa Agenda on Cybersecurity and Big Data analytics, specifically in helping governments to prevent or proactively deter crime, boost National security by protecting critical ICT infrastructures and enhance the level of cybersecurity awareness(Smart Africa, 2018).
This research also contributes to the achievement of Smart Africa Agenda in the Kenyan context by promoting the implementation of the Computer Misuse and Cybercrime Act 2018, Part III section 40 on reporting of cyber threats.
Login To Comment