-
Notifications
You must be signed in to change notification settings - Fork 92
Git Job Error Handling #6517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git Job Error Handling #6517
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
}, | ||
highlight: hasL && hasR, | ||
}; | ||
// TODO (huydhn): Fix the passing of tensor_parallel_size to the benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I haven't removed this yet. Let me clean this up
export function computeGeomean(data: LLMsBenchmarkData[], metricName: string) { | ||
const metricValues: { [key: string]: number[] } = {}; | ||
const returnedGeomean: LLMsBenchmarkData[] = []; | ||
const processJobLevelFailureRows = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this processJobLevelFailureRows
could be done in https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_llms/query.sql by returning 2 new columns failure_type
and failure_report
? If it's doable without blowing up the complexity of the SQL query, it seems like an easier way to implement this change as SQL syntax is more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, actually i modified the query to include failure_type. added
But this is still needed, this basically maps the GIT_JOB failure row into multiple rows
if (!val) { | ||
return false; | ||
} | ||
if (jobLevelFailureConfig["content"].includes(val) && isJobLevelFailure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, do you know that is the default value returning by filter
here? I'm trying to skim through https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter but haven't found it yet. I'm trying to figure out the behavior when a new device is added into the list:
apple_iphone_15,
samsung_galaxy_s22,
samsung_galaxy_s24,
google_pixel_8_pro,
where jobLevelFailureConfig["content"].includes(val)
would return false.
job_level_failure: { | ||
key_name: "device", | ||
content: [ | ||
"apple_iphone_15", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huydhn
you mean default value here?
We may need to maintain the config here
This reverts commit d82e073.
accidently merged it into the depended branch this is same as Fix bug for the failure handling #6517
Overview
Add logic to hand Git Job Handle:
the handling only triggers when a configuration for a repo exists
The ui renders it as normal failure handling
Demo
the demo pr: #6516
mimic GIt_Job Failure in with
model: "edsr", backend: "qnn_q8", mode: "inference", device: "samsung_galaxy_s22"
demo link with fake data in vercel
UI snapshot
Others [BE]