Reinforcement Learning: An Interactive Guide

Master RL concepts through visualizations, interactive examples, and hands-on simulations. Based on Sutton & Barto's classic textbook.

Start Learning Original Textbook

Interactive Visualizations

See algorithms in action with animated diagrams and real-time simulations.

Hands-on Examples

Play games, adjust parameters, and experiment with different strategies.

LaTeX Equations

All mathematical formulas rendered beautifully with clear explanations.

Chapters

Introduction

Available

The Reinforcement Learning Problem

Learn the fundamentals of RL: what it is, how it differs from other ML paradigms, and the key elements that make up any RL system.

Start Chapter

Multi-armed Bandits

Available

Evaluative Feedback & Action Selection

Explore the simplest form of RL: the k-armed bandit problem. Learn action-value methods, exploration strategies, and gradient-based approaches.

Start Chapter

Finite Markov Decision Processes

Available

The MDP Framework

Formalize the RL problem using MDPs. Learn about states, actions, rewards, policies, and value functions.

Start Chapter

Dynamic Programming

Available

Planning with a Known Model

Solve MDPs when the environment model is known. Learn policy evaluation, policy improvement, and value iteration.

Start Chapter

Monte Carlo Methods

Available

Learning from Complete Episodes

Learn from experience without a model. Monte Carlo methods average sample returns to estimate value functions.

Start Chapter

Temporal-Difference Learning

Available

Bootstrapping Without a Model

Combine the best of MC and DP. TD methods learn from incomplete episodes by bootstrapping from current estimates.

Start Chapter

n-step Bootstrapping

Available

Unifying TD and Monte Carlo

Bridge TD(0) and MC with n-step methods. Control the bias-variance tradeoff by choosing how many steps to look ahead.

Start Chapter

Planning and Learning with Tabular Methods

Available

Integrating Model-Based and Model-Free RL

Learn how to combine model-based planning with model-free learning. Explore Dyna, prioritized sweeping, MCTS, and more.

Start Chapter

On-policy Prediction with Approximation

Available

Function Approximation for Large State Spaces

Scale RL to large state spaces using function approximation. Learn SGD methods, linear approximation, feature construction, and neural networks.

Start Chapter

On-policy Control with Approximation

Available

Learning Optimal Policies with Function Approximation

Extend function approximation to control. Learn semi-gradient Sarsa, the average reward setting, and differential value functions.

Start Chapter

Off-policy Methods with Approximation

Available

Learning from Different Policies

Explore the challenges of off-policy learning with function approximation. Understand the deadly triad, Bellman error, and stable algorithms like Gradient-TD.

Start Chapter

Eligibility Traces

Available

Unifying TD and Monte Carlo Methods

Discover eligibility traces, a mechanism that bridges TD and Monte Carlo methods. Learn about λ-returns, TD(λ), True Online TD(λ), and how traces enable efficient credit assignment.

Start Chapter

Policy Gradient Methods

Available

Learning Parameterized Policies Directly

Move beyond action-value methods to learn policies directly. Explore the policy gradient theorem, REINFORCE, actor-critic methods, and continuous action spaces.

Start Chapter

Psychology

Available

RL and Animal Learning

Explore the deep connections between reinforcement learning and psychology. Understand how RL algorithms correspond to classical and instrumental conditioning, the TD model, cognitive maps, and habitual vs goal-directed behavior.

Start Chapter

Neuroscience

Available

RL and the Brain

Discover the remarkable connections between reinforcement learning and neuroscience. Learn how dopamine neurons encode TD errors, the neural actor-critic architecture, and how RL provides a computational framework for understanding the brain's reward system.

Start Chapter

Applications and Case Studies

Available

RL Success Stories

Explore landmark applications of reinforcement learning: from TD-Gammon and Samuel's checkers to DeepMind's Atari-playing DQN and AlphaGo. See how RL has achieved superhuman performance in games and real-world systems.

Start Chapter

Frontiers

Available

The Future of RL

Explore the cutting edge of reinforcement learning: general value functions, temporal abstraction via options, partial observability, reward design challenges, and the role of RL in the future of artificial intelligence.

Start Chapter

Research Papers

Deep dives into cutting-edge ML research papers, explained with interactive visualizations and beginner-friendly breakdowns.

Engram: Conditional Memory via Scalable Lookup

2025

A New Axis of Sparsity for LLMs

Xin Cheng, Wangding Zeng, Damai Dai, et al. • DeepSeek-AI / Peking University

Introduces conditional memory as a complementary sparsity axis to MoE, using N-gram embeddings for O(1) lookup. Discovers a U-shaped scaling law for optimal allocation between neural computation and static memory.

Read Explanation

DeepSeek-OCR: Contexts Optical Compression

2025

Compressing Text via Vision for Efficient LLM Training

Haoran Wei, Yaofeng Sun, Yukun Li • DeepSeek-AI

Investigates compressing lengthy text contexts through optical 2D mapping. Achieves 97% OCR accuracy at 10x compression with only 100 vision tokens, enabling 200k+ pages/day processing on a single A100 GPU.

Read Explanation