EC418 Reinforcement Learning • RL

PyTux Kart Controller

Built and evaluated multiple control approaches for SuperTuxKart (PyTux), including tabular Q-Learning, linear function approximation, CNN-based learning, and a high-performing PID baseline, comparing stability, convergence, and lap efficiency.

Python RL CNN Controls PyTorch

Jump to Overview Results Reflection

Focus

Fast, stable driving policies under limited features and noisy observations.

Overview

The goal of this EC418 final project was to build an agent capable of driving in SuperTuxKart (PyTux) by predicting aim points from frames and selecting actions to reduce completion time. I explored multiple RL approaches, then compared them against a strong classical baseline (PID).

What I built

A training + evaluation pipeline with reward shaping, feature extraction, rollout tracking, and multiple learning/control methods under the same environment interface.

What I measured

Completion frames/time, stability in sharp turns, collision behavior, and convergence reliability under limited state features.

Reinforcement Learning

I began with tabular Q-Learning and then moved to linear function approximation due to the large state space.

\[ Q(s,a) \leftarrow Q(s,a) + \alpha \left[ r + \gamma \max_{a'} Q(s',a') - Q(s,a) \right] \]

The agent used an epsilon-greedy exploration strategy with decay. Rewards were shaped to encourage progress (aim alignment), safe driving, and target speed, while penalizing collisions and off-track behavior. Tabular methods struggled with convergence and required excessive rollouts.

Neural Networks

Deep Q-Learning improved generalization by approximating the Q-function with a network, but stability remained sensitive to features and track geometry.

\[ \theta_{t+1} = \theta_t + \alpha_t \left( r + \gamma \max_{a'} Q_{\theta_t}(s', a') - Q_{\theta_t}(s, a) \right) \nabla_{\theta} Q_{\theta_t}(s, a) \]

Smaller kernels improved sharp turn handling. ReLU increased consistency, while batch norm + dropout prevented shortcut attempts but didn’t dramatically reduce completion times compared to TD baselines.

PID Controller

A classical PID baseline outperformed the RL approaches under the selected features, achieving smooth steering and reliable progress through turns.

\[ u(t) = K_p e(t) + K_i \int_{0}^{t} e(\tau) d\tau + K_d \frac{d}{dt} e(t) \]

Steering PID minimized lateral error (aim point vs. track center), and speed PID regulated velocity. Rescue logic handled stuck/off-track states while preventing integrator windup.

Why PID won (in this setup)

With limited and noisy features, PID produced consistent control without requiring long training. RL needed richer observations/features to reliably learn corner cases.

Results

PID achieved the most reliable completions with the fewest frames under the feature set used. RL approaches improved gradually but suffered from convergence instability and sensitivity to track conditions.

Best performing

PID Controller — robust, consistent, minimal tuning overhead.

Most promising direction

Vision + richer features (track geometry, curvature cues, better state encoding).

Reflection

Challenges

• High training time / many rollouts

• Feature selection limited performance

• Neural net tuning + environment dependency issues

What I learned

• RL algorithm tradeoffs in real systems

• Reward shaping + exploration design

• CNN tuning effects on behavior

Back to Projects Top