[communicator] add `proxy_command` support to connection block #36643

mattlqx · 2025-03-05T15:29:05Z

I have a Terraform use-case where I am using an AWS EC2 Instance Connect Endpoint to access instances in different AWS accounts that do not have direct network access from the network location in which Terraform runs so that SSH connections can be established for use on remote-exec provisioner blocks. AWS's connection guide offers a few different methods:

Using a pipe to aws ec2-instance-connect open-tunnel via an ssh -o ProxyCommand= argument.
Running aws ec2-instance-connect open-tunnel --local-port as a background process to be used as a TCP proxy.
Using aws ec2-instance-connect ssh to open an interactive ssh session.

Of these, using an out-of-the-box Terraform I attempted to get number 2 to work. It was hacky, using terraform_data with local-exec provisioners to control starting and stopping of the process in a detached tmux. I had some success with it but am running into issues with handling Terraform resource provisioners leaving the processes hanging. It would simply be much cleaner to implement the ability to use a ProxyCommand-like attribute on the Terraform ssh connections.

This PR implements a proxy_command attribute on the connection block that will pipe SSH communication through an exec'd process. This enables the ability to use aws ec2-instance-connect as described in number 1. No workarounds are then required to use Terraform with instances that are only reachable through EC2 Instance Connect or potentially any number of other proxy methods.

Target Release

1.12.x

CHANGELOG entry

This change is user-facing and I added a changelog entry.
This change is not user-facing.

crw · 2025-03-05T17:09:30Z

Thanks for this submission! I will raise it in triage.

crw · 2025-03-12T21:58:15Z

In triage we were not sure if this was necessary due to ProxyJump (-J) being a more modern version of ProxyCommand. However, subsequent investigation revealed that ProxyJump, in not allowing for running arbitrary commands, does not solve the problem at hand which requires running the aws cli.

That said, this being related to enabling a provisioner falls under the general guidance around changes to Provisioners found in the Contributing guide: https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. I will raise this in triage again to see if this is something we would be willing to review in the short term. Thanks again for the submission!

mattlqx · 2025-03-13T14:55:00Z

Thanks for the update. Can you help explain a little more what the recommendation would be for preparing/communicating with an instance created by Terraform? It seems like in the docs you're pushed towards userdata, but the lack of direct feedback to the Terraform run via that mechanism isn't desirable. While I understand the desire to not use these general provisioners, they are a core part of Terraform's flexibility when dealing with operating systems on the resources being created (at minimum to kickstart or ensure operation of other management systems in real time).

I'm trying to envision what the alternative would be... a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I appreciate any more direction here.

crw · 2025-03-13T18:40:21Z

a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I

Possibly? I agree that is the direction we seem to be pushing users when these types of use cases come up. When we review this again (next Tuesday) I will also raise this question.

jbardin

While the proposal still needs to go though triage, a few things caught my eye on a quick read through so I'm dropping notes in here for whomever ends up reviewing the proposal.

internal/communicator/ssh/communicator.go

crw · 2025-03-19T17:52:30Z

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

jbardin · 2025-03-19T21:31:47Z

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

mattlqx · 2025-03-21T13:07:38Z

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

Thanks, I'll start addressing the feedback today.

mattlqx · 2025-03-21T13:09:54Z

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

crw · 2025-03-21T17:07:51Z

@mattlqx I suspect GitHub notifications doesn't always notify me on certain PR actions, so give me a mention when this is ready for review again. Thanks!

mattlqx · 2025-03-26T12:04:44Z

@crw @jbardin I think I addressed the feedback appropriately. I removed the template string and process management parts. I was trying to compensate for bad behavior in the aws tool detailed in aws/aws-cli#9344, so was a bit overzealous with trying to force kill remaining child processes and that shouldn't be terraform's responsibility. This PR works fine with a patched aws that fixes its underlying issue (or nc or anything else).

radeksimko · 2025-04-03T09:59:48Z

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.
I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

I think what @jbardin was proposing is an ephemeral resource such as

ephemeral "aws_ec2_instance_connect_tunnel" "example" {
  instance_id = aws_instance.example.id
}

resource "aws_instance" "example" {
  // ...
  connection {
      type        = "ssh"
      user        = "ubuntu"
      host        = ephemeral.aws_ec2_instance_connect_tunnel.example.host
      port        = ephemeral.aws_ec2_instance_connect_tunnel.example.port
      private_key = file("~/.ssh/id_rsa")
  }
}

It does imply that the aws_instance needs to be created first, so that an ID is available and can be passed to the ephemeral resource, which in turn opens the tunnel and provides the tunnel details, such as port number etc. back to the provisioner.

I don't know how exactly provisioners are handled in the graph and it is very likely that this would indeed represent a cycle.

That said I think it is worth prototyping it and confirming if that is the case - it may be an argument for some further changes in Core that would enable ephemeral resources to be used like this - e.g. have provisioners represented as a separate node in the graph from the resource itself.

github-actions · 2025-04-03T20:13:55Z

Changelog Warning

Currently this PR would target a v1.13 release. Please add a changelog entry for in the .changes/v1.13 folder, or discuss which release you'd like to target with your reviewer. If you believe this change does not need a changelog entry, please add the 'no-changelog-needed' label.

mattlqx · 2025-04-05T11:41:25Z

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.
I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

I think what @jbardin was proposing is an ephemeral resource such as
ephemeral "aws_ec2_instance_connect_tunnel" "example" {
  instance_id = aws_instance.example.id
}

resource "aws_instance" "example" {
  // ...
  connection {
      type        = "ssh"
      user        = "ubuntu"
      host        = ephemeral.aws_ec2_instance_connect_tunnel.example.host
      port        = ephemeral.aws_ec2_instance_connect_tunnel.example.port
      private_key = file("~/.ssh/id_rsa")
  }
}
It does imply that the aws_instance needs to be created first, so that an ID is available and can be passed to the ephemeral resource, which in turn opens the tunnel and provides the tunnel details, such as port number etc. back to the provisioner.

I don't know how exactly provisioners are handled in the graph and it is very likely that this would indeed represent a cycle.

That said I think it is worth prototyping it and confirming if that is the case - it may be an argument for some further changes in Core that would enable ephemeral resources to be used like this - e.g. have provisioners represented as a separate node in the graph from the resource itself.

It's an interesting possibility. There's no reason why it wouldn't work but I'm not convinced it's better than piping to an arbitrary command on connection. In the proposed ephemeral resource, it would be spawning a background process (or at least a long running thread) with a listening port and it would live for as long as the run. Contrasting to the usage I'm proposing in this PR, the process is started on-demand per connection and torn down, and not requiring an open listening port. Assigning listening ports may get pretty interesting if you have 100 or 1000 instances. There's also a max-connections value that needs to be configured appropriately when run in this mode or else it could fail connections prematurely. I do prefer the simplicity of piping and avoiding that all together.

[communicator] add proxy_command support to connection block

0a847e4

mattlqx requested review from a team as code owners March 5, 2025 15:29

This was referenced Mar 5, 2025

ec2-instance-connect open-tunnel doesn't exit after pipe is closed aws/aws-cli#9344

Open

[ec2-instance-connect] add more cleanup to websockets aws/aws-cli#9346

Open

jbardin reviewed Mar 14, 2025

View reviewed changes

crw added enhancement waiting-response An issue/pull request is waiting for a response from the community provisioners labels Mar 19, 2025

address feedback, remove template string and process management

087c885

remove process group

3363c14

crw removed the waiting-response An issue/pull request is waiting for a response from the community label Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[communicator] add `proxy_command` support to connection block #36643

[communicator] add `proxy_command` support to connection block #36643

mattlqx commented Mar 5, 2025

crw commented Mar 5, 2025

crw commented Mar 12, 2025

mattlqx commented Mar 13, 2025

crw commented Mar 13, 2025

jbardin left a comment

crw commented Mar 19, 2025

jbardin commented Mar 19, 2025

mattlqx commented Mar 21, 2025

mattlqx commented Mar 21, 2025

crw commented Mar 21, 2025

mattlqx commented Mar 26, 2025

radeksimko commented Apr 3, 2025

github-actions bot commented Apr 3, 2025

mattlqx commented Apr 5, 2025

[communicator] add proxy_command support to connection block #36643

Are you sure you want to change the base?

[communicator] add proxy_command support to connection block #36643

Conversation

mattlqx commented Mar 5, 2025

Target Release

CHANGELOG entry

crw commented Mar 5, 2025

crw commented Mar 12, 2025

mattlqx commented Mar 13, 2025

crw commented Mar 13, 2025

jbardin left a comment

Choose a reason for hiding this comment

crw commented Mar 19, 2025

jbardin commented Mar 19, 2025

mattlqx commented Mar 21, 2025

mattlqx commented Mar 21, 2025

crw commented Mar 21, 2025

mattlqx commented Mar 26, 2025

radeksimko commented Apr 3, 2025

github-actions bot commented Apr 3, 2025

Changelog Warning

mattlqx commented Apr 5, 2025

[communicator] add `proxy_command` support to connection block #36643

[communicator] add `proxy_command` support to connection block #36643