Predicting Phishing Vulnerabilities Using Machine Learning
Contributing USMA Research Unit(s)
Electrical Engineering and Computer Science, Cyber Research Center
This paper examines the ability to use machine learning to predict an undergraduate student’s actions upon receiving a phishing email. The machine learning models used in this work were trained with actual phishing results augmented with student’s background and administrative data. The ultimate goal of this project is to better identify members of an organization that are at risk from phishing, to provide targeted cyber security training. This targeted training will increase the security posture of an organization and minimize unnecessary training and productivity loss. The results of multiple machine learning techniques demonstrate that this approach is viable with validation accuracy ranging from 49 to 86%. Other metrics are used to evaluate the viability of the approaches, recall is determined to be the most important. The model with the best performance in validation using these two metrics was a Support Vector Machine (SVM). The SVM approach was able to predict whether a cadet would be compromised upon receipt of a phishing attack with a 55% accuracy while maintaining a recall score of 71%. When using the trained model on new data after training and validation the Logistic Regression model had the highest performance, accurately predicting whether a cadet would be compromised upon receipt of a phishing attack with a 86% accuracy while maintaining a recall score of 16%.
S. Rutherford, K. Lin and R. W. Blaine, "Predicting Phishing Vulnerabilities Using Machine Learning," SoutheastCon 2022, 2022, pp. 779-786, doi: 10.1109/SoutheastCon48659.2022.9764045.
Record links to items hosted by external providers may require fee for full-text.