Welcome
  • Introduction
  • Course Outline and Big Picture
  • Where to get the Code
  • How to Succeed in this Course
  • Warmup
Return of the Multi-Armed Bandit
  • Section Introduction: The Explore-Exploit Dilemma
  • Applications of the Explore-Exploit Dilemma
  • Epsilon-Greedy Theory
  • Calculating a Sample Mean (pt 1)
  • Epsilon-Greedy Beginner's Exercise Prompt
  • Designing Your Bandit Program
  • Epsilon-Greedy in Code
  • Comparing Different Epsilons
  • Optimistic Initial Values Theory
  • Optimistic Initial Values Beginner's Exercise Prompt
  • Optimistic Initial Values Code
  • UCB1 Theory
  • UCB1 Beginner's Exercise Prompt
  • UCB1 Code
  • Bayesian Bandits / Thompson Sampling Theory (pt 1)
  • Bayesian Bandits / Thompson Sampling Theory (pt 2)
  • Thompson Sampling Beginner's Exercise Prompt
  • Thompson Sampling Code
  • Thompson Sampling With Gaussian Reward Theory
  • Thompson Sampling With Gaussian Reward Code
  • Why don't we just use a library?
  • Nonstationary Bandits
  • Bandit Summary, Real Data, and Online Learning
  • (Optional) Alternative Bandit Designs
  • Suggestion Box
High Level Overview of Reinforcement Learning
  • What is Reinforcement Learning?
  • From Bandits to Full Reinforcement Learning
Markov Decision Proccesses
  • MDP Section Introduction
  • Gridworld
  • Choosing Rewards
  • The Markov Property
  • Markov Decision Processes (MDPs)
  • Future Rewards
  • Value Functions
  • The Bellman Equation (pt 1)
  • The Bellman Equation (pt 2)
  • The Bellman Equation (pt 3)
  • Bellman Examples
  • Optimal Policy and Optimal Value Function (pt 1)
  • Optimal Policy and Optimal Value Function (pt 2)
  • MDP Summary
Dynamic Programming
  • Dynamic Programming Section Introduction
  • Iterative Policy Evaluation
  • Designing Your RL Program
  • Gridworld in Code
  • Iterative Policy Evaluation in Code
  • Windy Gridworld in Code
  • Iterative Policy Evaluation for Windy Gridworld in Code
  • Policy Improvement
  • Policy Iteration
  • Policy Iteration in Code
  • Policy Iteration in Windy Gridworld
  • Value Iteration
  • Value Iteration in Code
  • Dynamic Programming Summary
Monte Carlo
  • Monte Carlo Intro
  • Monte Carlo Policy Evaluation
  • Monte Carlo Policy Evaluation in Code
  • Monte Carlo Control
  • Monte Carlo Control in Code
  • Monte Carlo Control without Exploring Starts
  • Monte Carlo Control without Exploring Starts in Code
  • Monte Carlo Summary
Temporal Difference Learning
  • Temporal Difference Introduction
  • TD(0) Prediction
  • TD(0) Prediction in Code
  • SARSA
  • SARSA in Code
  • Q Learning
  • Q Learning in Code
  • TD Learning Section Summary
Approximation Methods
  • Approximation Methods Section Introduction
  • Linear Models for Reinforcement Learning
  • Feature Engineering
  • Approximation Methods for Prediction
  • Approximation Methods for Prediction Code
  • Approximation Methods for Control
  • Approximation Methods for Control Code
  • CartPole
  • CartPole Code
  • Approximation Methods Exercise
  • Approximation Methods Section Summary
Interlude: Common Beginner Questions
  • This Course vs. RL Book: What's the Difference?
Stock Trading Project with Reinforcement Learning
  • Beginners, halt! Stop here if you skipped ahead
  • Stock Trading Project Section Introduction