AI Safety Fundamentals

Summer 2026 Curriculum

An 8-week introductory reading group covering the current trajectory of AI, evidence for misalignment, threat models, technical safety approaches, and the AI policy landscape. Participants meet weekly in small sections facilitated by experienced TAs. No work is assigned outside of weekly meetings.

0

Introduction to Machine Learning

Self-paced ML fundamentals: neural networks, transformers, and backpropagation.

1

Trends & Timelines

Scaling drivers, capability trends, and time-horizon forecasting toward AGI.

2

Outer Alignment

Reward misspecification, specification gaming, RLHF, and the gap between intended and operationalized goals.

3

Inner Alignment

Deception, reward tampering, alignment faking, and goal misgeneralization.

4 Coming Soon

Threat Models

Instrumental convergence, power-seeking, bioterrorism, cyberwarfare, and gradual disempowerment.

5 Coming Soon

Control & Scalable Oversight

AI control framework, resampling, monitoring, weak-to-strong generalization, and debate.

6 Coming Soon

Interpretability & Evals

Attribution graphs, linear probes, capability and propensity evals, and alignment auditing.

7 Coming Soon

AI Governance & Liability

Tort law, compute governance, US export controls on China, and the regulator's toolbox.

8 Coming Soon

Research & Careers in Safety

Empirical research workflow, active alignment agendas, and career paths in AI safety.

Looking for a previous cohort? View the Spring 2026 curriculum archive.