USMA Research Unit Affiliation
Systems Engineering
Date of Award
Winter 12-29-2006
Degree Type
Doctor of Philosophy (PhD)
Document Type
Doctoral Dissertation
Department
Decision Sciences and Engineering Systems
Abstract
This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classification, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. Detailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classification model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classifier decision values. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios.
USMA Research Goals Supported
Develop the Faculty Professionally, Address Important Issues Facing the Army and Nation
First Advisor
Mark J. Embrechts
Second Advisor
Boleslaw K. Szymanski
Third Advisor
Joseph G. Ecker
Publisher
Rensselaer Polytechnic Institute
Recommended Citation
Evangelista, Paul, "The Unbalanced Classification Problem: Detecting Breaches in Security" (2006). West Point ETD. 14.
https://digitalcommons.usmalibrary.org/faculty_etd/14
Included in
Numerical Analysis and Scientific Computing Commons, Operational Research Commons, Statistical Models Commons, Systems Engineering Commons