when we run distillation pipeline with huge data, there is a chance that process fails in infer phase due to any network like issues. In such cases we need restart mechanism from where we left off. But in your case when you used 1m instruction dataset, didn't you have any issues??