Reinforcement Learning Course Overview
Foundations of Sequential Decision Making
This course provides a structured introduction to Reinforcement Learning (RL), a branch of machine learning where agents learn to make decisions by interacting with an environment. The course is designed to build understanding step by step, starting with core concepts, moving into mathematical foundations, and then applying those concepts through algorithms and methods.
Unit 1: Foundations of Reinforcement Learning
Introduces the core ideas behind RL:
- What reinforcement learning is
- Agent, environment, states, actions, rewards
- The interaction loop
- Exploration vs. exploitation
This unit builds the intuition needed before diving into mathematical concepts.
Unit 2: Markov Decision Processes (MDP)
Focuses on the formal framework used to model RL problems:
- Definition of Markov Decision Processes
- The Markov Property
- States, actions, and rewards
- Transition probabilities
- Discount factor (γ)
- Policies
- Introduction to value functions
This unit prepares you for the mathematical definitions used in later units.
Unit 3: Mathematical Foundations of Reinforcement Learning
Covers the formal equations that define RL:
- State Value Function Vπ(s)V^\pi(s)Vπ(s)
- Action-Value Function Qπ(s,a)Q^\pi(s,a)Qπ(s,a)
- Policy Improvement
- Value Iteration formula
Includes Python examples to show how these formulas are implemented in practice, bridging theory and application.
Unit 4: Dynamic Programming Methods
Applies the mathematical concepts to structured solution methods:
- Policy Evaluation
- Policy Improvement
- Policy Iteration
- Value Iteration (algorithmic perspective)
- Convergence properties
Focuses on understanding the process and algorithms, without repeating implementation details.
Projects:
- [COMING SOON] Policy Iteration Experiment → Apply iterative policy evaluation and improvement on a gridworld and track convergence.
- Value Iteration Simulation → Implement value iteration and visualize how state values converge over time.
Unit 5: Model-Free Reinforcement Learning
Introduces learning without a known environment model:
- Monte Carlo methods
- Temporal Difference (TD) learning
- Differences between model-based and model-free learning
This unit highlights how agents can learn purely from experience.
Projects:
- [COMING SOON] Monte Carlo Agent → Train an agent using sampled episodes and compare returns.
- [COMING SOON] Temporal Difference Learning → Implement TD(0) and compare learning speed with Monte Carlo.
Unit 6: Q-Learning and Advanced Methods
Covers widely used RL algorithms:
- Q-Learning
- Off-policy vs on-policy learning
- Exploration strategies (ε-greedy)
This unit provides practical insight into one of the most foundational RL algorithms.
Projects:
- [COMING SOON] Q-Learning in Gridworld → Learn a policy using a Q-table and ε-greedy exploration.
- [COMING SOON] Exploration Strategies Experiment → Compare ε-greedy, softmax, and decaying ε strategies.
Overview
By progressing through these units, you will develop a strong understanding of reinforcement learning, from core concepts and mathematical foundations to practical algorithms and implementations. Through the included projects, you will gain hands-on experience with dynamic programming methods, model-free learning, and Q-Learning, allowing you to see how agents learn optimal behavior in different environments. This combination of theory and practice prepares you to apply reinforcement learning to more complex and real-world problems.