[Queue Time Histogram] Add Job Queue Time Lambda #6435

yangw-dev · 2025-03-18T03:59:05Z

Description

Add aws lambda to generate in-queue job histgram, steps:

add logic to generate snapshot of in-queue jobs [This PR]
add logic to generate histogram
suppport multi-threading for backfilling

the snapshot data we generate includes:
{ queue_s, repo, workflow_name , job_name, htm_url, machine_type, time, runner_labels}

the runner_labels includes the machine_type, and other categories such as linux, dynamic etc

Design Doc:doc

working result in s3 (Ran locally)

s3 link

vercel · 2025-03-18T03:59:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Updated (UTC)
torchci	⬜️ Ignored (Inspect)	Visit Preview	Mar 19, 2025 11:11pm

aws/lambda/oss_ci_job_queue_time/lambda_function.py

aws/lambda/oss_ci_job_queue_time/Makefile

aws/lambda/oss_ci_job_queue_time/lambda_function.py

jeanschmidt

Are you able to also create some sort of breakdowns/aggregations?

Like Ephemeral vs NonEphemeal, meta-owned/non-meta-owned, pet vs dynamic, Linux/Mac/Windows, etc?

yangw-dev · 2025-03-18T22:42:46Z

Are you able to also create some sort of breakdowns/aggregations?

Like Ephemeral vs NonEphemeal, meta-owned/non-meta-owned, pet vs dynamic, Linux/Mac/Windows, etc?

added, I copied some of the File handling code from ci-pct.py

yangw-dev · 2025-03-19T04:03:18Z

Are you able to also create some sort of breakdowns/aggregations?

Like Ephemeral vs NonEphemeal, meta-owned/non-meta-owned, pet vs dynamic, Linux/Mac/Windows, etc?

added, it will be stored in array field as runner_labels

jeanschmidt · 2025-03-19T16:53:25Z

aws/lambda/oss_ci_job_queue_time/lambda_function.py

+    arguments = parse_args()
+
+    # update environment variables for input parameters
+    os.environ["CLICKHOUSE_ENDPOINT"] = arguments.clickhouse_endpoint


It is a not so great engineering standard to hide the original value by overwriting it with cli arguments.

It makes more sense to default the CLI argparse from the environment os.environ. But you can do with a special Config class if you want to as well.

kk, I can make it different!

aws/lambda/oss_ci_job_queue_time/lambda_function.py

jeanschmidt · 2025-03-19T17:05:36Z

aws/lambda/oss_ci_job_queue_time/lambda_function.py

+                runner_labels["other"].add(machine_type)
+
+
+def create_runner_labels(


Can this piece of code be reused? So we can avoid having to update in multiple places?

Just today Nikita is planning to move macos-m2-15 from apple ownership to ours. So we'll need to update the metrics and aggregations...

If we create many places to update we're setting ourselves to make mistakes

lambda is a bit tricky to do it, i think I can do it in a BE pr, right now my focus is get this kick in

aws/lambda/oss_ci_job_queue_time/lambda_function.py

jeanschmidt

I believe that before we move forward with more details, we need to fix a few design details. Making sure this script is idempotent is critical for any processing pipeline. Please reach out to discuss if you want to :)

jeanschmidt

Approving now, as we agreed to work on the python script in a next iteration so to avoid a too big of a PR

yangw-dev · 2025-03-19T23:13:07Z

fixed some nits, moving to next pr

add time

1da9088

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2025

yangw-dev changed the title ~~add time~~ [queue time] Add Job Queue Time Lambda Mar 18, 2025

yangw-dev requested review from jeanschmidt and clee2000 March 18, 2025 03:59

yangw-dev added 2 commits March 17, 2025 21:10

rename function for consistency

a774afb

replace mock patch

32f77b8

yangw-dev marked this pull request as ready for review March 18, 2025 04:11

yangw-dev added 8 commits March 17, 2025 21:25

replace mock patch

8fcc0e7

replace mock patch

871a646

replace mock patch

c2267bc

fix test

35289aa

fix test

52dcbce

fix test

b6022f3

fix test

b1064c7

fix test version

7a1b5aa

clee2000 reviewed Mar 18, 2025

View reviewed changes

aws/lambda/oss_ci_job_queue_time/lambda_function.py Show resolved Hide resolved

jeanschmidt reviewed Mar 18, 2025

View reviewed changes

aws/lambda/oss_ci_job_queue_time/Makefile Show resolved Hide resolved

jeanschmidt reviewed Mar 18, 2025

View reviewed changes

aws/lambda/oss_ci_job_queue_time/lambda_function.py Outdated Show resolved Hide resolved

jeanschmidt reviewed Mar 18, 2025

View reviewed changes

aws/lambda/oss_ci_job_queue_time/lambda_function.py Outdated Show resolved Hide resolved

jeanschmidt reviewed Mar 18, 2025

View reviewed changes

aws/lambda/oss_ci_job_queue_time/lambda_function.py Outdated Show resolved Hide resolved

jeanschmidt requested changes Mar 18, 2025

View reviewed changes

fix test version

3baa920

yangw-dev requested a review from clee2000 March 18, 2025 20:54

yangw-dev added 3 commits March 18, 2025 20:05

fix sync

ca24376

fix sync

c64762f

fix sync

2814b4b

vercel bot deployed to Preview March 19, 2025 07:35 View deployment

yangw-dev added 8 commits March 19, 2025 02:07

typo

2f47298

typo

a6b8113

typo

aa1d08c

typo

b8a1086

typo

e91f959

typo

6b3b889

reform code

4d8440f

comment

3698aa6

jeanschmidt reviewed Mar 19, 2025

View reviewed changes