Image Recognition with Machine Learning

A classical machine learning approach to facial recognition of U.S. presidents using HOG features and an SVM-based scikit-learn pipeline, demonstrating that strong results are achievable with just hundreds of training images.

Abstract

We present image recognition using a classical machine learning algorithm, specifically Support Vector Machines (SVM), and show an image library of faces of recent U.S. presidents. We demonstrate that even with a limited number of samples -- on the order of hundreds -- good results can be obtained without deep learning, and very good results can be achieved if the number of samples is higher, on the order of thousands.

Key Contributions

  • Built an end-to-end scikit-learn pipeline for facial recognition using Histogram of Oriented Gradients (HOG) features, StandardScaler normalization, and Stochastic Gradient Descent classification
  • Created a custom dataset by crawling 1,000+ images of six recent U.S. presidents from Google and Bing, with automated preprocessing (face cropping, resizing to 200x200, grayscale conversion, normalization)
  • Achieved approximately 70% average accuracy across six classes, with the best single-model accuracy reaching 76%, using only a few hundred training images per class
  • Automated model selection via GridSearchCV with 10-fold cross-validation, comparing SVM, SGD, K-Nearest Neighbors, Decision Trees, and Random Forest classifiers
  • Provided a practical comparison of traditional image processing vs. deep learning trade-offs, demonstrating when classical ML is sufficient and cost-effective
  • PDF: to be hosted
  • arXiv / TechRxiv: to be added

Back to Research & Papers