Abstract
Choosing an acceptable professional career route is one of the most essential decisions that students must make in our society today. The increasing number of alternative jobs and prospects in computing, makes this decision more challenging. The goal of this study was to identify computing career parameters, compare and contrast Random Forest and Naive Bayes (NB) supervised machine learning algorithms and then develop a prototype. This objectives were accomplished through CRISP-DM using the Kaggle repository data 4 ver1 dataset. Computing professional parameters for prediction include professional skills and abilities, CPGA, communication skills, analytical skills, team player, personal interest and professional experience.
The algorithms for predicting careers have been thoroughly examined. Due to their excellent prediction accuracy, Random forest and Nave Bayes were identified for career prediction system. The model was developed using five, ten, fifteen and nineteen attributes. This study examined the percentage F1 score, recall, precision and accuracy of these two cutting-edge supervised learning approaches. Testing and training of both algorithms was done using the same datasets with varied number of attributes. The findings demonstrated that the Random Forest algorithm outperformed the Naïve Bayes algorithm that had an accuracy of 89.885% as well as higher recall, precision in addition F1 score. There was the gradual increase in all performance metrics using different number of attributes until it reaches the point of stagnation.
A prototype based on the Random Forest method was developed. The prototype developed was evaluated for accuracy in career prediction for computing college students. This prototype can be used by computer graduates and Human Resource managers to make more accurate and consistent prediction on computing careers.
Table of Contents
Declaration i
Dedication ii
Acknowledgements iii
Abstract iv
List of Figures vii
List of Tables viii
List of Abbreviation ix
1.0 CHAPTER ONE: INTRODUCTION
1.1 Background Information 1
1.2 Statement of the problem 2
1.3 Objectives 3
1.3.1 General objective 3
1.3.2 Specific Objectives 3
1.4 Research questions 3
1.5 Significance of the study 4
1.6 Justification 4
1.7 Scope of the study 4
1.8 Limitation of the study 5
2.0 CHAPTER TWO: LITERATURE REVIEW
2.1 Introduction 6
2.2 Career in computer and information technology 6
2.3 Career Prediction 7
2.4 Prediction Models 9
2.5 Career Prediction Parameters 11
2.6 Dataset 12
2.7 Feature selection Techniques 12
2.8 Machine Learning Algorithms 14
2.8.1 Prediction using Naïve Bayes(NB) 14
2.8.2 Random Forest 17
2.9 Evaluation 18
2.10 Research gap 19
3.0 CHAPTER THREE: METHODOLOGY
3.1 Research design 21
3.2 The Proposed Model's Overall Architecture 21
3.3 CRISP-DM Overview 23
4.0 CHAPTER FOUR: RESULTS AND DISCUSSION
4.1 Introduction 31
4.2 Experiment setup 31
4.3 Model Building 32
4.4 Modeling Techniques and Tools Used 32
4.5 Performance Evaluation for Predictive Model 32
4.5.1 Predictive model and basic classification results using Jupyter Notebook 32
4.5.2 Training dataset 33
4.5.3 Results of interpretation of the training data set 34
4.6 Development and Implementation of the Proposed Prototype 35
4.7 Prototype Testing 37
CHAPTER FIVE: CONCLUSION AND RECOMMENDATIONS
5.1 Introduction 39
5.2 Research findings summary 39
5.3 Conclusion 39
5.4 Challenges 40
5.5 Limitations 40
5.6 Recommendations 40
REFERENCES 42
APPENDICES 45
List of Figures
Figure 1 Computing career classification by Wong & Kemp, 2017 6
Figure 2 Criteria for Feature selection techniques for Machine Learning (Brownlee, 2019) 13
Figure 3 Random Forest Classifier by Subudhi, Dash, & Sabut, 2019 18
Figure 4 Proposed Computing career classification system overall Architecture 21
Figure 5 CRISP-DM Steps guide 23
Figure 6 Preview of missing values on the dataset 26
Figure 7 Preview of data after dropping rows with missing values on the dataset 26
Figure 8 Sample Data after One hot Encoding career dataset 27
Figure 9 One-hot encoded data after dropping a column in each level 28
Figure 10 Output of Feature selection using Chi-square 29
Figure 11 Raw data Loaded on Juypter Notebook 31
Figure 12 Sample Data after One hot Encoding career dataset 31
Figure 13 The performance metrics of Prediction Algorithms 34
Figure 14 The Pycharm of the proposed Computing career classification system 36
Figure 15 GUI for the proposed prototype 36
Figure 16 proposed prototype prediction output 37
List of Tables
Table 1 Dataset parameters distributions 24
Table 2 Performance Metric After one-hot encoding 27
Table 3 Performance metrics of Prediction Algorithms 33
Table 4 Prototype Testing Results 37
List of Abbreviation
CRISP-DM: Cross Industry Standard Process for Data Mining
CSV - Comma Separated Values
NB - Naïve Bayes FN - False Negative FP - False Positive
CDDQ - Decision-Making Difficulties Questionnaires
RF - Random Forest
ANOVA - Analysis of Variance
LDA - Linear Discriminant Analysis HTML - HyperText Markup Language GUI - Graphical User interface
KNN – K-Nearest Neighbor
CGPA - Cumulative Grade Point Average
1.0 CHAPTER ONE: INTRODUCTION
1.1 Background Information
Computing careers have grown in the past decade. Most Kenyan universities are now offering information technology(IT), computer information systems(CIS) as well as computer science(CS) as a degrees. The path to career specialization after taking these degrees is not well defined. College students undertaking degree programs will require guidance on the careers that best suit them .
College education defines the career path of a student. According to (Mualuko, 2007), 47% of primary schools pupils are able to join secondary level and 27% of secondary school students get access to post-secondary education (technical colleges and universities). Computing college students who take the right career path will significantly contribute in the economic development and Psychosocial well-being of its citizens (Kabari & Agaba, 2019). Economically, there is an increase in the real gross domestic product per capita. Psychosocial well-being is improving the quality of life which involves living healthier and longer by balancing the emotional factors , social and physical components (Eiroa-Orosa, 2020). Today’s market demands computing experts in all pillars of Kenya’s vision 2030. This has led to creation of job opportunities in the computing field. Career guidance is required in order to produce employees who will fit in the ten key sectors of vision 2030 strategies. The workforce in Kenya is below standard as expected and aspiration towards Kenya’s vision 2030. Therefore, Career guidance should be balanced and skewed towards majority of computing students without any discrimination. Human potential can only be unleashed through proper career advice in the education process. In as much as college students are mature to understand the career path, there is a discrepancy on their expectations and available jobs in the market (Macharia, 2019).
In digital technology, it is not easy to define specialty in computing because technology is dynamic and continues to redefine itself with new improvements. A specialty in information technology and computing, however, can be categorized into Database Administrators, Information Security Analysts, Solution Architects, Computer Systems Analysts, Computer Network Architects, Computer Programmers and Developers, Design and UX, E-Commerce Analysts, and Network Security Engineers, per the US Bureau of Labor Statistics. (Wong & Kemp, 2017).
In order to create career counseling models, many algorithms have been utilized. Some of these algorithms are Decision Tree(DT), Linear Regression(LR), K-Nearest Neighbor(KNN) and Naive Bayes(NB) (Gerhana, Fallah, Zulfikar, Maylawati, & Ramdhani, 2019).
The decision on one’s career is a key element in a person’s life. There exist many challenges encountered by college students, from wrong career selection as it opens the door for lifelong consequences. Reliance on human beings as the guiders and counselors whose reasoning is based on human experience might vary from one individual to another (Kazi Afaq, Sharif, & Ahmad, 2017).
According to (Subahi, 2018), one of the most crucial choices a graduate must make is selecting the first suitable job in any field linked to computing. Making this decision has become difficult for graduates due to the latest surge in profession routes and work prospects in computer interrelated sectors. The productivity of graduates in their first work will be significantly impacted and diminished as a result of choosing improper employment that does not match their talents and knowledge. This makes it essential to have a tool that can help graduates assess their skills after earning a degree in computers, allowing them to select the best entry-level positions. In Kenya, college students are left to be guided by career counselors. Through counseling, students are counseled on how to make tough decisions, plan their career and understand their abilities. The main challenge in counseling is to engage the students in the process. According to a UK research, 45% of those over 14 received inadequate or insufficient employment guidance (Macharia, 2019). Most parents and students lack adequate information and therefore choose their career through perception of the ideal computing career (Kazi Afaq, Sharif, & Ahmad, 2017).
1.2 Statement of the problem
Long term work progression prediction is not well explored due to diversity in career trajectory and career planning. Developed models cannot support current job seekers in planning their long term career paths. The models focus on immediate steps of career movement and are informed by theoretical guidance (Michiharu, Yunqi, Thanh, Yongfeng, & Dongwon, 2022).
Discrepancy between what computing students are capable of and what is on the ground is created by counselors who focus on advising students on the career depending only on results rather than facts and potentials of the students (Nunsina & Situmorang, 2020).
According to (Nunsina & Situmorang, 2020), the algorithm used was not optimally reliable in predicting the specific fields of computing specialties due to minimal number of speciality areas considered. The three major areas considered include Software engineering, multimedia and Computer Engineering network. In a research by (Lagman, 2019; Razaque, 2017) in academic performance and career, they state that existing computing career prediction models could not assist computing college students in selecting the right career appropriately because they do not consider social and environmental factors but only academic factors.
1.3 Objectives
1.3.1 General objective
Development of a model based on Machine Learning for career prediction for college graduate students in computing specialties
1.3.2 Specific Objectives
1. To find fundamental career prediction parameters for college students in computing.
2. To identify the appropriate machine learning algorithm for career prediction by computing college students.
3. To develop an automated career prediction model based on the appropriate algorithm for college students in computing.
4. To evaluate the proposed career prediction model for accurate prediction.
1.4 Research questions
1. What are the fundamental career prediction parameters for college students in computing?
2. What are the appropriate machine learning algorithms for career prediction by computing college students?
3. How to develop an automated career prediction model based on the appropriate algorithm for college students in computing?
4. How to evaluate the proposed career prediction model for college students in computing?
1.5 Significance of the study
The project was meant toward assisting college students select a career based on their capabilities, interests, background information and performance. It is important for computing college students to have a guide on their career paths based on factual concepts. The research contributes to industrial human resource management where shortlisting and engagement of computing graduates will be guided by the model developed.
1.6 Justification
Lack of a computing specialty model for college students with high accuracy, robust and efficiency has led to computing students pursuing careers that are not of their interest, leading to psychosocial problems. This research will act as a guide to computing student where reliable career guidance will be done based on different parameters.
Career prediction in the computational job market using Artificial Intelligence, can greatly help not only to job seekers, in understanding and planning their future pathway, but also for recruiters in finding talented computing employees (Michiharu, Yunqi, Thanh, Yongfeng, & Dongwon, 2022). In human resource departments, manual shortlisting of applicants to specific computer related vacancies are used. The research will assist the department in doing the shortlisting of applicants based on their performance, interests, certification, workshops and background information. As a guiding principle, the research will enhance good time management and selection of competent computing employees for the vacancies.
Random Forest as well as Naïve Bayes algorithms have proven accuracy, robust and efficiency in prediction therefore are common. They provide an easy and fast class prediction for both the training and testing data. Where the parameters are independent and in cases of categorical input variables, the Naïve Bayes algorithm and Random Forest perform better than other algorithms (Gerhana et. at., 2019).
1.7 Scope of the study
The model will accept average cumulative grade point of the computing student and background information to predict computing students specialty.
All computing students from University, colleges and tertiary institutions will be able to use the model to predict their specialty, as well as industrial human resource managers, when selecting and shortlisting applicants for employment in computing.
1.8 Limitation of the study
Owing to limited period as well as insufficient resources, the research employed secondary data rather than gathering primary data from graduate college students.
This model will not be able to address all specialties in computing as technology is dynamic and changing every day.
The model will not be able to cater to other students in colleges who are undertaking courses that are not related to computing.
Login To Comment