For example figure 1:  in general, I am trying to figure out if in general people train transformers wrt epochs or iterations (1 iteration is one batch).