09c7d77d rapid fire: - memory coalescing in CUDA + why it matters - optimize matmul kernel (tiling, shared mem, bank conflicts) - comm optimization in multi GPU training (allreduce, ring vs tree) if u haven't done CUDA before, dont apply for this team.