AI Safety Fundamentals

Spring 2026 Curriculum

An 8-week introductory reading group covering the current trajectory of AI, evidence for misalignment, threat models, technical safety approaches, and the AI policy landscape. Participants meet weekly in small sections facilitated by experienced TAs. No work is assigned outside of weekly meetings.

Introduction to Machine Learning

Self-paced ML fundamentals: neural networks, transformers, and backpropagation.

Transformative AI and Current Trajectory

Scaling drivers, capability trends, and time-horizon forecasting toward AGI.

Outer Alignment

Reward misspecification, specification gaming, RLHF, and the gap between intended and operationalized goals.

Inner Alignment

Deception, reward tampering, alignment faking, and goal misgeneralization.

Threat Models

Instrumental convergence, power-seeking, bioterrorism, cyberwarfare, and gradual disempowerment.

Control & Scalable Oversight

AI control framework, resampling, monitoring, weak-to-strong generalization, and debate.

Interpretability & Evals

Attribution graphs, linear probes, capability and propensity evals, and alignment auditing.

AI Governance & Liability

Tort law, compute governance, US export controls on China, and the regulator's toolbox.

Research & Careers in Safety

Empirical research workflow, active alignment agendas, and career paths in AI safety.

In partnership with AISST

maia-exec@mit.edu

Accessibility