AI Safety Fundamentals
Spring 2026 Curriculum
An 8-week introductory reading group covering the current trajectory of AI, evidence for misalignment, threat models, technical safety approaches, and the AI policy landscape. Participants meet weekly in small sections facilitated by experienced TAs. No work is assigned outside of weekly meetings.
Introduction to Machine Learning
Self-paced ML fundamentals: neural networks, transformers, and backpropagation.
Transformative AI and Current Trajectory
Scaling drivers, capability trends, and time-horizon forecasting toward AGI.
Outer Alignment
Reward misspecification, specification gaming, RLHF, and the gap between intended and operationalized goals.
Inner Alignment
Deception, reward tampering, alignment faking, and goal misgeneralization.
Threat Models
Instrumental convergence, power-seeking, bioterrorism, cyberwarfare, and gradual disempowerment.
Control and Scalable Oversight
AI control techniques, weak-to-strong generalization, debate, and iterated amplification.
Mechanistic Interpretability and Evals
Circuits, sparse autoencoders, feature visualization, and evaluation methodologies.
International Policy and Liability
Export controls, compute governance, model weight security, and AI liability frameworks.
Policy, Careers in Alignment
AI regulation toolbox, career paths in alignment research, and next steps.