3

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization