Reinforcement Learning

Foundations of Sequential Decision Making

Module Overview

Reinforcement Learning (RL) studies how agents learn to make decisions by interacting with an environment to achieve goals. Agents receive feedback through rewards and adapt their behavior to maximize long-term outcomes. This course covers the core mathematical principles and algorithms behind RL, focusing on how value functions are computed, optimal policies emerge, and agents balance exploration with exploitation. Projects provide hands-on experience in different types of RL environments, including spatial, stochastic, and risk-sensitive domains.

Introduction to Reinforcement Learning

  • Core concepts: Agent, Environment, Reward, Policy
  • Episodic vs Continuing Tasks
  • Return and Discount Factor
  • The Markov Property
  • Markov Decision Processes (MDPs)
  • Exploration vs Exploitation

Mathematical Foundations

  • State-value function and Action-value function
  • Bellman Expectation Equation
  • Bellman Optimality Equation
  • Defining the Optimal Policy
  • The Contraction Property

Dynamic Programming (Model-Based RL)

  • Evaluating a Policy
  • Improving and Iterating Policies
  • Value Iteration
  • Grid World Examples