For each paper, by 12AM the evening before lecture:
Q1: What are the differences between Data Parallelism, Model Parallelism, and Pipeline Parallelism in distributed training?