-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99
Comments
If we switch to putting everything in the jar, wouldn't that imply compiling the venv on the user's machine, which may be different than the deployment target? |
Yes, but isn't it probably a good idea for people to be developing in a VM that's the same as the deployment target anyway? |
This seems like a great idea to me. As long as it's configurable for anyone that ends up with compilation issues, this would be a nice win for simplicity IMO. |
I actually just thought of an interesting way we might be able to make this work. Apparently Storm supports hooks that get triggered on certain events. As a part of that each hook can have a From what I can tell, we would just make sure that the virtualenv only gets built once, because @amontalenti @msukmanowsky Any thoughts on this? |
The hooks thing seems interesting but I think I'd like to try and ping the Apache team to support a topology-level hook as opposed to us doing some flock stuff to avoid race conditions from multiple components all executing the same code. No gotchas with this approach and other package managers like Conda right? I haven't used Conda yet. |
It would have to be more than a topology-level thing, since it would have to run on every machine the topology is running on. One way to avoid locals and race conditions would be to make each shell component run in its own independent environment. If we were using conda, we wouldn't even need to store multiple copies of everything that way, because conda hardlinks packages to each other (and has its own locking mechanism to make sure two conda commands aren't messing with the package index at the same time). |
This is important for me as well, we're using Streamparse to deploy Machine Learning models. Compiling numerical and machine learning libraries during deployment is very painful and takes a lot of time :) |
+1 for this based on the mailing list request I put in and feedback from @rduplain |
This depends on #84 as prerequisite. |
Where does the status on this sit? |
This is mostly waiting on pex-tool/pex#316 being merged. Once that's set, we'll transition from using primarily using virtualenv's to using PEX (see #212). With a PEX, we can ship everything we need inside the JAR. There will need to be a little bit of work done to work around the issue that executable permissions are lost when you create JAR, but the main hold up is PEX not supporting manylinux wheels. Without those, you could not really deploy to a Linux machine from OS X or Windows if one of your project's dependencies needed to be compiled. |
@dan-blanchard can I clarify that the current jar sparse command does not bundle the venv like Pyleus did? |
Correct. We currently update the venvs on the workers directly via fabric.
The hope is that someday we will be able to switch to using PEX instead,
but we're waiting on a wheel support PR being merged there.
…On Thu, Sep 14, 2017 at 10:57 PM James Westover ***@***.***> wrote:
@dan-blanchard <https://github.com/dan-blanchard> can I clarify that the
current jar sparse command does not bundle the venv like Pyleus did?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#99 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7l2SSU5gW3TNlsPIfERrRj6hWUyHaFks5sieeUgaJpZM4DV_HF>
.
|
If your interested, my workaround for this is using docker. Its a bit ugly but bear with me. see here https://github.com/Richard-Mathie/storm_swarm Basically the idea is to deploy the storm workers as a docker service using docker swarm, mounting a volume for the virtual environments to exist in. You can then have another service which builds the virtual environment for those storm workers to run from. If you deploy the services as Building is done using pip and a I am looking forward to the day though when I just have to submit a JAR to the |
I've been looking over the Pyleus code a little, and one thing that they do that makes deployment simpler is that they create the entire virtualenv inside the JAR instead of having it reside on the servers. They don't require SSH at all for anything, because they require people to have Storm installed somewhere on their path, and then they just use the
storm
command directly and specify the host and port for nimbus to it. I end up installingstorm
with streamparse anyway so that I can runstorm ui
, and I don't think I'm alone there.If we switched to putting everything in the JAR, then we wouldn't have to worry about anything with SSH anymore and could hopefully get rid of our dependency on fabric, since that's not Python 3 compatible (as I keep mentioning 😄).
The text was updated successfully, but these errors were encountered: