JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

dan-blanchard · 2015-01-22T19:53:22Z

I've been looking over the Pyleus code a little, and one thing that they do that makes deployment simpler is that they create the entire virtualenv inside the JAR instead of having it reside on the servers. They don't require SSH at all for anything, because they require people to have Storm installed somewhere on their path, and then they just use the storm command directly and specify the host and port for nimbus to it. I end up installing storm with streamparse anyway so that I can run storm ui, and I don't think I'm alone there.

If we switched to putting everything in the JAR, then we wouldn't have to worry about anything with SSH anymore and could hopefully get rid of our dependency on fabric, since that's not Python 3 compatible (as I keep mentioning 😄).

The text was updated successfully, but these errors were encountered:

codywilbourn · 2015-01-22T21:09:24Z

If we switch to putting everything in the jar, wouldn't that imply compiling the venv on the user's machine, which may be different than the deployment target?

dan-blanchard · 2015-01-22T21:41:33Z

Yes, but isn't it probably a good idea for people to be developing in a VM that's the same as the deployment target anyway?

coffenbacher · 2015-03-31T00:54:55Z

This seems like a great idea to me. As long as it's configurable for anyone that ends up with compilation issues, this would be a nice win for simplicity IMO.

dan-blanchard · 2015-04-17T15:21:20Z

I actually just thought of an interesting way we might be able to make this work. Apparently Storm supports hooks that get triggered on certain events. As a part of that each hook can have a prepare method that is called at the time the TopologyContext is put together, which happens before the actual ShellBolt and ShellSpout prepare methods are called. It would take a tiny bit of JVM code, but we could implement a hook that's only purpose was to build the virtualenv from a requirements file we put in the JAR. That way you're always building on the same architecture, and we wouldn't be bloating the JAR size.

From what I can tell, we would just make sure that the virtualenv only gets built once, because TopologyContext.addHook would get called for every component in the topology if we use the topology.auto.task.hooks config setting, which is the simplest way to add hooks.

@amontalenti @msukmanowsky Any thoughts on this?

msukmanowsky · 2015-04-20T17:36:09Z

The hooks thing seems interesting but I think I'd like to try and ping the Apache team to support a topology-level hook as opposed to us doing some flock stuff to avoid race conditions from multiple components all executing the same code.

No gotchas with this approach and other package managers like Conda right? I haven't used Conda yet.

dan-blanchard · 2015-04-20T17:47:06Z

It would have to be more than a topology-level thing, since it would have to run on every machine the topology is running on. One way to avoid locals and race conditions would be to make each shell component run in its own independent environment.

If we were using conda, we wouldn't even need to store multiple copies of everything that way, because conda hardlinks packages to each other (and has its own locking mechanism to make sure two conda commands aren't messing with the package index at the same time).

sixers · 2015-04-25T18:51:56Z

This is important for me as well, we're using Streamparse to deploy Machine Learning models. Compiling numerical and machine learning libraries during deployment is very painful and takes a lot of time :)

westover · 2015-08-27T14:18:48Z

+1 for this based on the mailing list request I put in and feedback from @rduplain

rduplain · 2015-10-16T15:13:57Z

This depends on #84 as prerequisite.

westover · 2017-07-10T14:33:56Z

Where does the status on this sit?

dan-blanchard · 2017-07-10T15:08:46Z

This is mostly waiting on pex-tool/pex#316 being merged. Once that's set, we'll transition from using primarily using virtualenv's to using PEX (see #212). With a PEX, we can ship everything we need inside the JAR. There will need to be a little bit of work done to work around the issue that executable permissions are lost when you create JAR, but the main hold up is PEX not supporting manylinux wheels. Without those, you could not really deploy to a Linux machine from OS X or Windows if one of your project's dependencies needed to be compiled.

westover · 2017-09-15T02:57:23Z

@dan-blanchard can I clarify that the current jar sparse command does not bundle the venv like Pyleus did?

dan-blanchard · 2017-09-15T18:27:52Z

Correct. We currently update the venvs on the workers directly via fabric. The hope is that someday we will be able to switch to using PEX instead, but we're waiting on a wheel support PR being merged there.

…

On Thu, Sep 14, 2017 at 10:57 PM James Westover ***@***.***> wrote: @dan-blanchard <https://github.com/dan-blanchard> can I clarify that the current jar sparse command does not bundle the venv like Pyleus did? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7l2SSU5gW3TNlsPIfERrRj6hWUyHaFks5sieeUgaJpZM4DV_HF> .

Richard-Mathie · 2017-11-22T15:19:04Z

If your interested, my workaround for this is using docker. Its a bit ugly but bear with me.

see here https://github.com/Richard-Mathie/storm_swarm

Basically the idea is to deploy the storm workers as a docker service using docker swarm, mounting a volume for the virtual environments to exist in. You can then have another service which builds the virtual environment for those storm workers to run from.

If you deploy the services as global any nodes you add to the swarm automatically get added to the storm cluster, and your venvs get build.

Building is done using pip and a requiremnts.txt file which is distributed to the nodes using a docker secret (though they have config now as well). Change requirements? then update the docker secrets which will trigger a restart and hence restart of the storm_venv service. Finally I have to disable ssh in streamparse and put dummy entries in to the nodes list so that it can set the number of workers to deploy to.

I am looking forward to the day though when I just have to submit a JAR to the nimbus.

dan-blanchard mentioned this issue Apr 27, 2015

Be able to deploy into VPC #134

Closed

dan-blanchard added this to the v2.0 milestone Apr 27, 2015

rduplain added the ready label Aug 27, 2015

rduplain mentioned this issue Sep 1, 2015

Consider dropping invoke and just using fabric for everything (or stop using SSH). #45

Closed

rduplain changed the title ~~JARs should be self-contained and not rely on external virtualenvs~~ JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks] Sep 1, 2015

rduplain mentioned this issue Sep 1, 2015

Fully support Python 3.4. #176

Closed

dan-blanchard mentioned this issue Sep 1, 2015

Support selection of Python implementation on an individual component. #183

Open

rduplain added the backlog label Sep 29, 2015

rduplain changed the title ~~JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks]~~ JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] Oct 16, 2015

rduplain removed the ready label Oct 26, 2015

dan-blanchard added the enhancement label Dec 20, 2015

dan-blanchard modified the milestones: v3.0, v3.1 Jul 27, 2016

dan-blanchard modified the milestones: v3.1, v3.2 Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

dan-blanchard commented Jan 22, 2015

codywilbourn commented Jan 22, 2015

dan-blanchard commented Jan 22, 2015

coffenbacher commented Mar 31, 2015

dan-blanchard commented Apr 17, 2015

msukmanowsky commented Apr 20, 2015

dan-blanchard commented Apr 20, 2015

sixers commented Apr 25, 2015

westover commented Aug 27, 2015

rduplain commented Oct 16, 2015

westover commented Jul 10, 2017

dan-blanchard commented Jul 10, 2017

westover commented Sep 15, 2017

dan-blanchard commented Sep 15, 2017 via email

Richard-Mathie commented Nov 22, 2017

JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

Comments

dan-blanchard commented Jan 22, 2015

codywilbourn commented Jan 22, 2015

dan-blanchard commented Jan 22, 2015

coffenbacher commented Mar 31, 2015

dan-blanchard commented Apr 17, 2015

msukmanowsky commented Apr 20, 2015

dan-blanchard commented Apr 20, 2015

sixers commented Apr 25, 2015

westover commented Aug 27, 2015

rduplain commented Oct 16, 2015

westover commented Jul 10, 2017

dan-blanchard commented Jul 10, 2017

westover commented Sep 15, 2017

dan-blanchard commented Sep 15, 2017 via email

Richard-Mathie commented Nov 22, 2017