Skip to content

runners: Add scaleCycle lambda to reuse runners #6892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

seemethere
Copy link
Member

@seemethere seemethere commented Jul 7, 2025

This lambda will attempt to reuse runners that have finished jobs that are sitting idle. Plan is to have this run in AWS on a cron.

The functionality within this lambda will eventually replace the tryReuseRunner function in scale-up.ts.

Note

This does not implement terraform yet, I'm trying to break up these PRs into smaller pieces to make it easier to review

Signed-off-by: Eli Uriegas [email protected]

[ghstack-poisoned]
@seemethere
Copy link
Member Author

seemethere commented Jul 7, 2025

seemethere added a commit that referenced this pull request Jul 7, 2025
This lambda will attempt to reuse runners that have
finished jobs that are sitting idle. Plan is to have this run in AWS on
a cron.

The functionality within this lambda will eventually replace the
tryReuseRunner function in scale-up.ts.

Signed-off-by: Eli Uriegas <[email protected]>
ghstack-source-id: bc13e21
ghstack-comment-id: 3046547816
Pull-Request: #6892
Copy link

vercel bot commented Jul 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Jul 7, 2025 9:20pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 7, 2025
[ghstack-poisoned]
seemethere added a commit that referenced this pull request Jul 7, 2025
This lambda will attempt to reuse runners that have
finished jobs that are sitting idle. Plan is to have this run in AWS on
a cron.

The functionality within this lambda will eventually replace the
tryReuseRunner function in scale-up.ts.

Signed-off-by: Eli Uriegas <[email protected]>
ghstack-source-id: debc427
ghstack-comment-id: 3046547816
Pull-Request: #6892
Signed-off-by: Eli Uriegas <[email protected]>
@seemethere seemethere closed this Jul 7, 2025
@seemethere seemethere reopened this Jul 7, 2025
@seemethere seemethere requested a review from a team July 7, 2025 22:21
console.info(`Reusing runner ${runner.instanceId} for ${getRepoKey(repo)}`);
}

await tryReuseRunner(runnerInputParameters, metrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a little too high level of a function to be invoking here.

In this loop you've already identified a candidate list of runners that are probably ready to be scaled down.

tryReuseRunner however looses context on what those candidates were and on each loop it'll starts from scratch again to find a potential runner it can scale down, starting with asking EC2 to list all runners of this runner type.

What do you think about adding a refactor to do something like the following:

  1. Extract the logic the logic to determine if a runner is safe to be shut down into one function, use that to determine the exact list of runners we want to refresh here
  2. Extract the logic to do the scale down (plus validations) into a separate function that takes in an explicit runner id. Pass the runners you find here into that list

Goal is to not end up looping over the same instances over and over again during a single ScaleCycle invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants