Abstract
Malware - also known as malicious programs or code, are one of the biggest threats in computing today. They have become very easy to develop and thousands are produced every day. They mutate very easily making it very difficult to control. The most accessible mitigation is anti-malware tools. However, due to the reasons above the traditional signature based malware scanning tools have proved insufficient. For this reason, the antimalware industry is constantly rethinking ways of improving their detection methods. (Ye, Li, Adjeroh, & Iyengar, 2017).
This research was conducted to assess and compare the performance of machine learning algorithms in the detection of malware.
TABLE OF CONTENTS
DECLARATION 2
Research Declaration 2
Declaration by Supervisor 2
Dedication 3
Acknowledgement 4
Abstract 5
Chapter one
Introduction
Background 9
Problem Statement 10
Overall Objective 11
Particular Objectives 11
Chapter Two
Literature Review
Introduction 12
Existing Work 12
Construction of Malware 12
Signature Based Detection 13
Static Analysis and Dynamic Analysis 14
Behaviour Based Detection 15
Machine Learning Detection Methods 15
Data Mining Techniques 16
Research Gap 17
Approach used to deal with the Research Gap 17
Research Questions 17
Conceptual Framework 17
Chapter Three
Research Methodology
Introduction 18
General Methodological Approach 18
Research Design 19
Data Acquisition 20
Population 20
Sample size 21
Evaluation and Validation 21
Reliability and Validity 22
Research Ethics 22
Chapter Four
Discussion
1. Cuckoo sandbox. 23
I. Virtual box 24
II. Virtual Network for the Laboratory: 24
2. Data Collection 24
3. Data Analysis 25
I. Cuckoo Sandbox Analysis 25
II. WEKA Machine Learning tool 28
III. Algorithms Analyzed 28
Results and Discussion 28
Chapter Five
Conclusion and Recommendation
Future work 31
Bibliography 31
Chapter One
Introduction
Background
Internet usage today is growing very fast, the world over. This has contributed to malware becoming one of the biggest threats. Further to this, developing malware has been overly simplified. This is because today there are many malicious tools and applications easily accessible on the internet, tools for automated detection. Purchasing malware has also been made very accessible. All these facilitate for any interested persons to easily become an attacker, even with very basic skill levels(Mouhammd Al-Kasassbeh, Mohammed, Alauthman, & Almomani, 2020).
According to definition given by Souppaya et al, malware, or malicious programs/code, are programs that are maliciously inserted into an existing system or program with the ill intention of causing some damage, by destroying data, executing destructive or intrusive programs, or by compromising the victim’s data, applications, or operating system, in terms of their confidentiality, integrity, or availability. Malware is considered the most prevalent external threat to technology users. A lot of damage and service or business disruption in many organizations are as a result of this. To overcome this challenges, these organizations have to go through extensive recovery efforts. (Souppaya & Scarfone, 2013).
Although newer forms of malware don't always properly fit into these, the classical categories of malwares include: (a) viruses which replicate themselves by inserting copies of itself into already installed or existing programs or files in a system. They could either be compiled viruses (meaning that the operating System is the one that executes them. These are of two types: file infector viruses - these attach themselves to .exe files or programs, and the second type is called boot sector viruses, this kind infects the boot sector of a machine or of removable media) or interpreted viruses (this means that they are executed by an application eg macro viruses and scripting viruses); (b) worms which are self-contained programs that replicate themselves. They don't need any user to execute them, they execute on their own. They could either be mass mailing worm or worms that affect the network service . (c) Trojan-horses which are programs that are self contained and which do not replicate themselves. However, although they appear to be harmless, in reality they have a malicious motive that is usually hidden. (d) Malicious mobile code, which is a software usually created using Java, ActiveX, Javascript or VB Script. They don't need a user to explicitly execute them, however, they tend to have a malicious intention. Their mode of operation is that while connected to a network, they tend to be transmitted from another machine accessible on the network, to the user’s machine.They are then executed on the user’s machine, and (e)Blended Attacks which use several transmission methods at the same time, for example, combining the propagation methods of viruses and worms. (Souppaya & Scarfone, 2013).
To curb the increasingly pronounced cyber security problems, a variety of security technologies are used to improve security outcomes, among them, Intrusion Detection Systems, Intrusion Prevention Systems, Firewalls, Commercial antiviruses (AV), etc. (Jardine, 2020). As protection from these threats, the majority of legitimate users make use of anti-malware software products sourced from the different companies specializing with these products. Typically, until very recently, the main detection method employed in the available and heavily employed solutions, is the signature-based malware detection method. Using this method, it has been possible to recognize the different threats that have already been identified. A malware signature is a sequence of bytes, usually not long, each of the already identified malware tend to have a distinct signature from any other. A record of this signatures are maintained inorder to ensure that newly encountered files by the antimalware product, are correctly identified with an optimum accuracy level (Ye, et al., 2011). The existence of malware development toolkits allow even attackers without experience to easily develop and even modify malware samples. This is done to easily evade detection thus counteracting largely the war against malware.
Through the signature based detection method, plenty of malware already identified previously are detected and deleted or blocked. However, there are still plenty of malware files, for example the “zero-day” malware, that are generated or which have mutated. These ones have a tendency to escape detection by the scanning tools that employ traditional signature-based methods. These variants have been a key consideration in the antimalware industry, always looking for better ways of addressing them more effectively. This is because most approaches that have been in use, are mainly based on different versions of the signature-based methods (Ye, Li, Adjeroh, & Iyengar, 2017).
With 2,818 valid responses of a survey conducted by AV – Comparatives, which was voted the most trustworthy and reliable among the Antivirus testing labs, the most-requested for desktop security solutions for home users and business/enterprise products are antimalware softwares. Worldwide, the majority of the participants in this survey, who use Windows Operating System,use commercial antivirus products to protect themselves from malware (AV-Comparatives, 2020). Due to its ease of use even by people with little technological or security know-how, antimalware products have proven to be indispensable solutions in society today, where almost everyone needs to use technology.
Problem Statement
In order to prevent the occurence of malware incidences, it is clear that antimalware softwares are key. However, up to now, it has not been possible for these solutions to achieve total success in stopping all malware incidents, despite having a lot of work going into improving the effectiveness of the products in existence today. Due to the fact that the process of malware mutation has become very easy, there is a rapid growth of new malware file samples. This has made the fight against malware even more complicated (Ye, Li, Adjeroh, & Iyengar, 2017).
This research seeks to rate and measure the performance of existing antimalware algorithms, a key factor in aiding malware designers in making better decisions in improving existing antimalware products and in designing new ones.
Overall Objective
● To evaluate the performance of selected machine learning algorithms.
Particular Objectives
1. To investigate existing machine learning algorithms that can be used for malware detection.
2. Evaluate their strengths and weaknesses.
3. o measure the performance of these algorithms in terms of accuracy, precision, recall rate and speed of operation.
Justification
The breakthrough in internet technology and computer networking have made high speed shared internet possible (During the ongoing global pandemic, this has heightened exponentially, with practically everyone needing to work or school online.) The effect of this development is the daily increase in the number of computer systems that have become susceptible or have suffered to malware attacks (Chakrabortya & Dey, 2016). Security analyzers are constantly competing with malware scholars as innovation grows in the search for better detection methods. However, the detection methods they propose have not been sufficient while the evolving nature and complexity of malware keeps changing fast and getting harder to recognize. (Alireza & Rahil, 2018).
With so much work going into malware research and analysis in an effort to come up with a solution that is able to effectively detect and provide complete protection against the variety of malware in existence, we have not yet got to the ideal situation of a conclusive antimalware solution. This research assesses the performance of the variety of existing machine learning algorithms for malware detection. This will help in creating more robust and efficient algorithms that have the capacity to overcome the weaknesses of the existing ones. This will be a great contribution to future research as well as security solution developers in narrowing down their work in the right direction.
Buyers has the right to create
dispute within seven (7) days of purchase for 100% refund request when
you experience issue with the file received.
Dispute can only be created when
you receive a corrupt file, a wrong file or irregularities in the table of
contents and content of the file you received.
ProjectShelve.com shall either
provide the appropriate file within 48hrs or
send refund excluding your bank transaction charges. Term and
Conditions are applied.
Buyers are expected to confirm
that the material you are paying for is available on our website
ProjectShelve.com and you have selected the right material, you have also gone
through the preliminary pages and it interests you before payment. DO NOT MAKE
BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.
In case of payment for a
material not available on ProjectShelve.com, the management of
ProjectShelve.com has the right to keep your money until you send a topic that
is available on our website within 48 hours.
You cannot change topic after
receiving material of the topic you ordered and paid for.
Login To Comment