Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[communicator] add proxy_command support to connection block #36643

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mattlqx
Copy link

@mattlqx mattlqx commented Mar 5, 2025

I have a Terraform use-case where I am using an AWS EC2 Instance Connect Endpoint to access instances in different AWS accounts that do not have direct network access from the network location in which Terraform runs so that SSH connections can be established for use on remote-exec provisioner blocks. AWS's connection guide offers a few different methods:

  1. Using a pipe to aws ec2-instance-connect open-tunnel via an ssh -o ProxyCommand= argument.
  2. Running aws ec2-instance-connect open-tunnel --local-port as a background process to be used as a TCP proxy.
  3. Using aws ec2-instance-connect ssh to open an interactive ssh session.

Of these, using an out-of-the-box Terraform I attempted to get number 2 to work. It was hacky, using terraform_data with local-exec provisioners to control starting and stopping of the process in a detached tmux. I had some success with it but am running into issues with handling Terraform resource provisioners leaving the processes hanging. It would simply be much cleaner to implement the ability to use a ProxyCommand-like attribute on the Terraform ssh connections.

This PR implements a proxy_command attribute on the connection block that will pipe SSH communication through an exec'd process. This enables the ability to use aws ec2-instance-connect as described in number 1. No workarounds are then required to use Terraform with instances that are only reachable through EC2 Instance Connect or potentially any number of other proxy methods.

Target Release

1.12.x

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@crw
Copy link
Contributor

crw commented Mar 5, 2025

Thanks for this submission! I will raise it in triage.

@crw
Copy link
Contributor

crw commented Mar 12, 2025

In triage we were not sure if this was necessary due to ProxyJump (-J) being a more modern version of ProxyCommand. However, subsequent investigation revealed that ProxyJump, in not allowing for running arbitrary commands, does not solve the problem at hand which requires running the aws cli.

That said, this being related to enabling a provisioner falls under the general guidance around changes to Provisioners found in the Contributing guide: https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. I will raise this in triage again to see if this is something we would be willing to review in the short term. Thanks again for the submission!

@mattlqx
Copy link
Author

mattlqx commented Mar 13, 2025

Thanks for the update. Can you help explain a little more what the recommendation would be for preparing/communicating with an instance created by Terraform? It seems like in the docs you're pushed towards userdata, but the lack of direct feedback to the Terraform run via that mechanism isn't desirable. While I understand the desire to not use these general provisioners, they are a core part of Terraform's flexibility when dealing with operating systems on the resources being created (at minimum to kickstart or ensure operation of other management systems in real time).

I'm trying to envision what the alternative would be... a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I appreciate any more direction here.

@crw
Copy link
Contributor

crw commented Mar 13, 2025

a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I

Possibly? I agree that is the direction we seem to be pushing users when these types of use cases come up. When we review this again (next Tuesday) I will also raise this question.

Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the proposal still needs to go though triage, a few things caught my eye on a quick read through so I'm dropping notes in here for whomever ends up reviewing the proposal.

@crw crw added enhancement waiting-response An issue/pull request is waiting for a response from the community provisioners labels Mar 19, 2025
@crw
Copy link
Contributor

crw commented Mar 19, 2025

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

@jbardin
Copy link
Member

jbardin commented Mar 19, 2025

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

@mattlqx
Copy link
Author

mattlqx commented Mar 21, 2025

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

Thanks, I'll start addressing the feedback today.

@mattlqx
Copy link
Author

mattlqx commented Mar 21, 2025

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

@crw
Copy link
Contributor

crw commented Mar 21, 2025

@mattlqx I suspect GitHub notifications doesn't always notify me on certain PR actions, so give me a mention when this is ready for review again. Thanks!

@mattlqx
Copy link
Author

mattlqx commented Mar 26, 2025

@crw @jbardin I think I addressed the feedback appropriately. I removed the template string and process management parts. I was trying to compensate for bad behavior in the aws tool detailed in aws/aws-cli#9344, so was a bit overzealous with trying to force kill remaining child processes and that shouldn't be terraform's responsibility. This PR works fine with a patched aws that fixes its underlying issue (or nc or anything else).

@radeksimko
Copy link
Member

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.
I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

I think what @jbardin was proposing is an ephemeral resource such as

ephemeral "aws_ec2_instance_connect_tunnel" "example" {
  instance_id = aws_instance.example.id
}

resource "aws_instance" "example" {
  // ...
  connection {
      type        = "ssh"
      user        = "ubuntu"
      host        = ephemeral.aws_ec2_instance_connect_tunnel.example.host
      port        = ephemeral.aws_ec2_instance_connect_tunnel.example.port
      private_key = file("~/.ssh/id_rsa")
  }
}

It does imply that the aws_instance needs to be created first, so that an ID is available and can be passed to the ephemeral resource, which in turn opens the tunnel and provides the tunnel details, such as port number etc. back to the provisioner.

I don't know how exactly provisioners are handled in the graph and it is very likely that this would indeed represent a cycle.

That said I think it is worth prototyping it and confirming if that is the case - it may be an argument for some further changes in Core that would enable ephemeral resources to be used like this - e.g. have provisioners represented as a separate node in the graph from the resource itself.

@crw crw removed the waiting-response An issue/pull request is waiting for a response from the community label Apr 3, 2025
Copy link
Contributor

github-actions bot commented Apr 3, 2025

Changelog Warning

Currently this PR would target a v1.13 release. Please add a changelog entry for in the .changes/v1.13 folder, or discuss which release you'd like to target with your reviewer. If you believe this change does not need a changelog entry, please add the 'no-changelog-needed' label.

@mattlqx
Copy link
Author

mattlqx commented Apr 5, 2025

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.
I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

I think what @jbardin was proposing is an ephemeral resource such as

ephemeral "aws_ec2_instance_connect_tunnel" "example" {
  instance_id = aws_instance.example.id
}

resource "aws_instance" "example" {
  // ...
  connection {
      type        = "ssh"
      user        = "ubuntu"
      host        = ephemeral.aws_ec2_instance_connect_tunnel.example.host
      port        = ephemeral.aws_ec2_instance_connect_tunnel.example.port
      private_key = file("~/.ssh/id_rsa")
  }
}

It does imply that the aws_instance needs to be created first, so that an ID is available and can be passed to the ephemeral resource, which in turn opens the tunnel and provides the tunnel details, such as port number etc. back to the provisioner.

I don't know how exactly provisioners are handled in the graph and it is very likely that this would indeed represent a cycle.

That said I think it is worth prototyping it and confirming if that is the case - it may be an argument for some further changes in Core that would enable ephemeral resources to be used like this - e.g. have provisioners represented as a separate node in the graph from the resource itself.

It's an interesting possibility. There's no reason why it wouldn't work but I'm not convinced it's better than piping to an arbitrary command on connection. In the proposed ephemeral resource, it would be spawning a background process (or at least a long running thread) with a listening port and it would live for as long as the run. Contrasting to the usage I'm proposing in this PR, the process is started on-demand per connection and torn down, and not requiring an open listening port. Assigning listening ports may get pretty interesting if you have 100 or 1000 instances. There's also a max-connections value that needs to be configured appropriately when run in this mode or else it could fail connections prematurely. I do prefer the simplicity of piping and avoiding that all together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants