Reinforcement Learning
Foundations of Sequential Decision Making
Module Overview
Reinforcement Learning (RL) studies how agents learn to make decisions by interacting with an environment to achieve goals. Agents receive feedback through rewards and adapt their behavior to maximize long-term outcomes. This course covers the core mathematical principles and algorithms behind RL, focusing on how value functions are computed, optimal policies emerge, and agents balance exploration with exploitation. Projects provide hands-on experience in different types of RL environments, including spatial, stochastic, and risk-sensitive domains.
Introduction to Reinforcement Learning
- Core concepts: Agent, Environment, Reward, Policy
- Episodic vs Continuing Tasks
- Return and Discount Factor
- The Markov Property
- Markov Decision Processes (MDPs)
- Exploration vs Exploitation
Mathematical Foundations
- State-value function and Action-value function
- Bellman Expectation Equation
- Bellman Optimality Equation
- Defining the Optimal Policy
- The Contraction Property
Dynamic Programming (Model-Based RL)
- Evaluating a Policy
- Improving and Iterating Policies
- Value Iteration
- Grid World Examples