Reinforcement Learning Course Overview

Foundations of Sequential Decision Making

This course provides a structured introduction to Reinforcement Learning (RL), a branch of machine learning where agents learn to make decisions by interacting with an environment. The course is designed to build understanding step by step, starting with core concepts, moving into mathematical foundations, and then applying those concepts through algorithms and methods.

Unit 1: Foundations of Reinforcement Learning

Introduces the core ideas behind RL:

What reinforcement learning is
Agent, environment, states, actions, rewards
The interaction loop
Exploration vs. exploitation

This unit builds the intuition needed before diving into mathematical concepts.

Unit 2: Markov Decision Processes (MDP)

Focuses on the formal framework used to model RL problems:

Definition of Markov Decision Processes
The Markov Property
States, actions, and rewards
Transition probabilities
Discount factor (γ)
Policies
Introduction to value functions

This unit prepares you for the mathematical definitions used in later units.

Unit 3: Mathematical Foundations of Reinforcement Learning

Covers the formal equations that define RL:

State Value Function Vπ(s)V^\pi(s)Vπ(s)
Action-Value Function Qπ(s,a)Q^\pi(s,a)Qπ(s,a)
Policy Improvement
Value Iteration formula

Includes Python examples to show how these formulas are implemented in practice, bridging theory and application.

Unit 4: Dynamic Programming Methods

Applies the mathematical concepts to structured solution methods:

Policy Evaluation
Policy Improvement
Policy Iteration
Value Iteration (algorithmic perspective)
Convergence properties

Focuses on understanding the process and algorithms, without repeating implementation details.

Projects:

[COMING SOON] Policy Iteration Experiment → Apply iterative policy evaluation and improvement on a gridworld and track convergence.
Value Iteration Simulation → Implement value iteration and visualize how state values converge over time.

Unit 5: Model-Free Reinforcement Learning

Introduces learning without a known environment model:

Monte Carlo methods
Temporal Difference (TD) learning
Differences between model-based and model-free learning

This unit highlights how agents can learn purely from experience.

Projects:

[COMING SOON] Monte Carlo Agent → Train an agent using sampled episodes and compare returns.
[COMING SOON] Temporal Difference Learning → Implement TD(0) and compare learning speed with Monte Carlo.

Unit 6: Q-Learning and Advanced Methods

Covers widely used RL algorithms:

Q-Learning
Off-policy vs on-policy learning
Exploration strategies (ε-greedy)

This unit provides practical insight into one of the most foundational RL algorithms.

Projects:

[COMING SOON] Q-Learning in Gridworld → Learn a policy using a Q-table and ε-greedy exploration.
[COMING SOON] Exploration Strategies Experiment → Compare ε-greedy, softmax, and decaying ε strategies.

Overview

By progressing through these units, you will develop a strong understanding of reinforcement learning, from core concepts and mathematical foundations to practical algorithms and implementations. Through the included projects, you will gain hands-on experience with dynamic programming methods, model-free learning, and Q-Learning, allowing you to see how agents learn optimal behavior in different environments. This combination of theory and practice prepares you to apply reinforcement learning to more complex and real-world problems.