Is the architecture training different policies for different tasks? If not how are we specifying the task at test time?
Is the architecture training different policies for different tasks? If not how are we specifying the task at test time?