Justin Yu
P E T C L A S S I F I E R
Overview

The purpose of this final project is to classify images of cats and dogs using various machine learning techniques, evaluating their performance on a shared dataset. The project involves implementing and comparing multiple classifiers, including Closest Average (CA), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Nearest Neighbor (NN), while exploring dimensionality reduction techniques like Principal Component Analysis (PCA). By analyzing the error rates across different dimensions and classifiers, the project aims to understand the trade-offs between model complexity, feature representation, and classification accuracy.

Supervised Learning

In this project, supervised learning is implemented through multiple classification techniques that rely on labeled data indicating whether each sample is a cat or a dog. The Closest Average Classifier predicts labels by comparing data points to the mean vectors of the two classes calculated using the labeled dataset. Linear Discriminant Analysis (LDA) uses pooled covariance and mean vectors to model the data as Gaussian distributions with shared covariance, while Quadratic Discriminant Analysis (QDA) models the data with separate covariances for each class. The Nearest Neighbor Classifier predicts labels by finding the closest labeled sample in the training set. Each classifier computes predictions for both training and testing datasets, and their performance is evaluated by calculating error rates against the true labels.

Unsupervised Learning

The unsupervised learning part of the code involves Principal Component Analysis (PCA), used for dimensionality reduction. PCA computes the mean and covariance matrix of the data, performs eigenvalue decomposition to identify principal components, and selects the top \( k \) components that capture the most variance. \[ \mathbf{X}^T_{\text{reduced}} = \left( \mathbf{X}^T - {\mu}_{\text{train}}^T \right) \mathbf{V}_k \] The data is then projected onto this reduced subspace, simplifying the feature set while retaining essential information. This reduced dataset is later used as input for supervised classifiers, enabling more efficient processing while preserving critical patterns in the data.

Challenges

• Poor show of results with selected features

• Requiring excessively long training times

• Limited prior knowledge of neural networks

• Incompatibility with environment and anaconda packages

Learned

• Various reinforcement learning algorithms and their implementations

• More practice with python including libraries like Tensorflow and Pytorch

• Introduction to neural networks and their applications

Back