Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(*System) Shutdown signature && bigmachine.Machine unique IDs #13

Open
DazWilkin opened this issue Oct 29, 2019 · 5 comments
Open

(*System) Shutdown signature && bigmachine.Machine unique IDs #13

DazWilkin opened this issue Oct 29, 2019 · 5 comments

Comments

@DazWilkin
Copy link

The signature of Start is:

(*System) Start(ctx context.Context, count int) ([]*bigmachine.Machine, error)

Whereas (its converse) Shutdown is:

(*System) Shutdown()

It feels as though it would be more consistent , if Shutdown's signature included both context.Context and []*bigmachine.Machine, also returning an error.

Even then, bigmachine.Machine's type does not include a unique ID for the machine (beyond an IP address; often not used as a key), would it make sense to add one?

I'm not retaining the list of machines created by (Start in) the GCE implementation and so, conversely when asked to Shutdown, I must first enumerate all the instances that (I think) have been created (I'm doing this by tag, could potentially use IP) and then make a call to delete these.

@mariusae
Copy link
Collaborator

(*System) Shutdown() is to shutdown the system implementation (e.g., maybe to serialize its internal state), and not to shut down individual machines.

A couple of things:

  1. The way machine shutdown works is not to explicitly shut them down, but rather to arrange for them to die once keepalives are no longer maintained. This has some nice "end-to-end" properties: in particular, it doesn't matter how the driver dies (gracefully, unexpectedly, or due to a network partition), the machine will eventually shut itself down, since a dead driver no longer maintains keepalives. This is done by the Supervisor service.

  2. About names. Yes, currently bigmachine names each machine by its Addr. This is also how machines can talk to each other (via Dial). While this works, it is also slightly problematic: Addrs can be recycled. For example, a machine could die, and another could come up on the same address. I have a plan to generate effectively a GUID as well so that a machine's name becomes its address concatenated by a GUID.

@DazWilkin
Copy link
Author

Thank you.

I misunderstood.

I'll review the EC2 implementation as I'm unclear how to "arrange for them to die" on GCE.

@mariusae
Copy link
Collaborator

On EC2, the way we do this is to set the instance shutdown behavior to "terminate", and then we instruct systemd to shut down the OS when the process fails.

@DazWilkin
Copy link
Author

I may have to be more explicit about this.

I think there's no way for GCE instances to delete themselves on shutdown|failure.

@mariusae
Copy link
Collaborator

mariusae commented Nov 1, 2019

Okay, maybe there's a way to invoke the GCE API from the command line once the process exits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants