Contributing USMA Research Unit(s)

Robotics Research Center

Publication Date


Publication Title

IEEE International Symposium on Technologies for Homeland Security

Document Type

Conference Proceeding


This paper presents a near real-time, multi-stage classifier which identifies people and handguns in images, and then further assesses the threat-level that a person poses based on their body posture. The first stage consists of a convolutional neural network (CNN) that determines whether a person and a handgun are present in an image. If so, a second stage CNN is then used to estimate the pose of the person detected to have a handgun. Lastly, a feed-forward neural network (NN) makes the final threat assessment based on the joint positions of the person’s skeletal pose estimate from the previous stage. On average, this entire pipeline requires less than 1 second of processing time on a desktop computer. The model was trained using approximately 2,000 images and achieved a pistol and person detection rate of 22% and 55%, respectively. The final stage NN correctly identified the severity of the threat with 84% accuracy. The images used to train each stage of our multi-classifier model are available online. With an expanded dataset the accuracy of detecting people and pistols can likely be improved in the future.