Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiprocessingSampler is broken for tf #1494

Open
ryanjulian opened this issue Jun 2, 2020 · 3 comments
Open

MultiprocessingSampler is broken for tf #1494

ryanjulian opened this issue Jun 2, 2020 · 3 comments
Assignees
Labels
backport-to-2020.06 Backport this PR to release-2020.06 bug Something isn't working
Milestone

Comments

@ryanjulian
Copy link
Member

Edit examples/tf/trpo_swimmer/ray_sampler.py to use MultiprocessingSampler and you will get:

ValueError: Variable GaussianMLPPolicy/GaussianMLPModel/dist_params/mean_network/hidden_0/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
@ryanjulian ryanjulian added bug Something isn't working backport-to-2019.10 Backport this PR to release-2019.10 labels Jun 2, 2020
@ryanjulian ryanjulian added this to the v2020.06.0 milestone Jun 9, 2020
@krzentner
Copy link
Contributor

I don't think I can solve this issue this week. Let's just not allow TF + MultiprocessingSampler for now.

@ryanjulian ryanjulian modified the milestones: v2020.06.0, v2020.09rc1 Jun 15, 2020
@ryanjulian ryanjulian added the backport-to-2020.06 Backport this PR to release-2020.06 label Jun 30, 2020
@ryanjulian ryanjulian modified the milestones: v2020.09rc1, v2020.09rc2 Jun 30, 2020
@ryanjulian ryanjulian assigned avnishn and unassigned avnishn and krzentner Jun 30, 2020
@ryanjulian ryanjulian mentioned this issue Jul 5, 2020
5 tasks
@ryanjulian ryanjulian assigned krzentner and unassigned krzentner Jul 22, 2020
@ryanjulian ryanjulian modified the milestones: v2020.09rc2, v2020.09rc3 Aug 3, 2020
@yeukfu yeukfu self-assigned this Aug 18, 2020
@yeukfu
Copy link
Contributor

yeukfu commented Aug 19, 2020

@krzentner @ryanjulian. multiprocessing uses os.fork() to start a process by default in Linux. This causes problems when we use TF, as fork will copy the tf.Graph() into sub process. There are two methods to address the issue:

  • Start the sub processes by spawn instead of fork, which will create a fresh sub process. But using spawn needs to add the followings into the launcher.
from multiprocessing import freeze_support

if __name__ == '__main__':
    freeze_support()     # https://docs.python.org/3/library/multiprocessing.html#multiprocessing.freeze_support

    trpo_swimmer_ray_sampler(seed=100)
  • Clear the graph in sub process at the beginning. I try to call tf.compat.v1.reset_default_graph() but failed because this cannot be called in a nested graph.

@krzentner
Copy link
Contributor

Yeah, unfortunately there's no way to "clear the graph." If you ever create a TF graph in a process, you can never fork in that process again. Modifying the launcher is also really unpleasant, especially because neither click nor clize know how to handle this case automatically (even though they reasonably should be able to).

There is another option though: fork before the graph is created. This is tricky though. Optimally, we would only fork one process when MultiprocessingSampler is imported, then use it to create additional processes when needed. However, this is complicated by the MultiprocessingSampler currently using the mp.Process objects directly (not through a mp.Queue) when shutting down. Unfortunately, due to some unfortunate design mistakes in multiprocessing, it's impossible to transmit the mp.Process object to another process, so that logic would need to be partially re-written to make this option work.

@yeukfu yeukfu assigned krzentner and unassigned yeukfu Aug 20, 2020
@ryanjulian ryanjulian removed the backport-to-2019.10 Backport this PR to release-2019.10 label Aug 27, 2020
@ryanjulian ryanjulian modified the milestones: v2020.09rc4, v2020.09rc5 Sep 2, 2020
@ryanjulian ryanjulian modified the milestones: v2020.09rc5, v2020.10.0rc6 Sep 18, 2020
@ryanjulian ryanjulian modified the milestones: v2020.10.0rc7, v2020.10.0 Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-2020.06 Backport this PR to release-2020.06 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants