AI Safety Fundamentals
Spring 2026 Curriculum
An 8-week introductory reading group covering the current trajectory of AI, evidence for misalignment, threat models, technical safety approaches, and the AI policy landscape. Participants meet weekly in small sections facilitated by experienced TAs. No work is assigned outside of weekly meetings.
Introduction to Machine Learning
Self-paced ML fundamentals: neural networks, transformers, and backpropagation.
Transformative AI and Current Trajectory
Scaling drivers, capability trends, and time-horizon forecasting toward AGI.
Outer Alignment
Reward misspecification, specification gaming, RLHF, and the gap between intended and operationalized goals.
Inner Alignment
Deception, reward tampering, alignment faking, and goal misgeneralization.
Threat Models
Instrumental convergence, power-seeking, bioterrorism, cyberwarfare, and gradual disempowerment.
Control & Scalable Oversight
AI control framework, resampling, monitoring, weak-to-strong generalization, and debate.
Interpretability & Evals
Attribution graphs, linear probes, capability and propensity evals, and alignment auditing.
AI Governance & Liability
Tort law, compute governance, US export controls on China, and the regulator's toolbox.
Careers in AI Safety
Career paths in AI safety, opportunities for MAIA members, and next steps.