Eval maintenance by gspencergoog · Pull Request #712 · google/A2UI

gspencergoog · 2026-02-25T00:34:34Z

Description

This updates the evals so that they include the latest Gemini models, cleans up a bunch of comments, and improve the failure detection logic so that if there is a failure in one "oneOf" case for components, it only prints the one component that fails instead of all the others that also fail because they aren't that component. Also updates to the latest genkit, and updates the v0_10 directory evals too. Also, fixed a problem in the catalog where the JSON schema format was specified as dateTime, when it should have been date-time to match the JSON schema spec.

Summary of Changes

This pull request focuses on essential maintenance and improvements for the evaluation infrastructure. It updates the underlying AI models to leverage the latest Gemini capabilities, refines the validation process for more accurate error identification, and ensures all dependencies are current. These changes collectively enhance the reliability, security, and diagnostic capabilities of the evaluation system.

Highlights

Model Updates: Evaluations have been updated to incorporate the latest Gemini models, including 'gemini-3-flash' and 'gemini-3.1-pro', replacing older versions and adjusting their rate limits.
Dependency Management: Numerous dependencies across both v0_10 and v0_9 evaluation environments have been updated to their latest versions, including @genkit-ai/* packages, ajv, prettier, esbuild, and others, ensuring up-to-date and secure development.
Improved Validation Logic: The validation process now includes ajv-formats for better schema adherence and features enhanced failure detection for 'oneOf' cases, providing more precise error reporting by focusing on the specific failing component rather than all possible failures.
JSON Schema Correction: A critical fix was applied to the basic_catalog.json schema, changing the format for date-time fields from dateTime to the correct date-time to align with the JSON schema specification.
Codebase Cleanup: Various comments and redundant code snippets have been removed or clarified across several TypeScript files, improving code readability and maintainability.
Evaluation Summary Enhancements: The evaluation summary now provides more detailed success metrics, including separate percentages for schema validation success and overall evaluation success, offering a clearer picture of model performance.

Changelog

specification/v0_10/eval/package.json
- Updated package name from 'a2ui_0_10_eval_llm' to 'a2ui_0_9_eval_llm'.
- Updated 'evalGemini' script to use 'gemini-3-flash' model.
- Updated various devDependencies and dependencies to newer versions.
specification/v0_10/eval/pnpm-lock.yaml
- Updated numerous package resolutions and integrity hashes to reflect new dependency versions.
specification/v0_10/eval/src/ai.ts
- Adjusted formatting for googleAI model configuration.
specification/v0_10/eval/src/analysis_flow.ts
- Adjusted formatting for z.object schema definition.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_10/eval/src/evaluation_flow.ts
- Removed a blank line at the beginning of the file.
- Adjusted formatting for z.object schema definitions.
- Removed comments related to estimated tokens and model config lookup.
- Clarified comments regarding eval model config lookup and safe defaults.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_10/eval/src/evaluator.ts
- Adjusted formatting for constructor parameters.
- Adjusted formatting for filter conditions.
- Adjusted formatting for logger info messages.
- Adjusted formatting for map function call.
- Adjusted formatting for setTimeout call.
- Adjusted formatting for yaml.dump and fs.writeFileSync calls.
specification/v0_10/eval/src/generation_flow.ts
- Adjusted formatting for rateLimiter.acquirePermit call.
- Adjusted formatting for logger error message.
- Updated comments regarding token usage reconciliation.
- Adjusted formatting for rateLimiter.recordUsage call.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_10/eval/src/generator.ts
- Adjusted formatting for constructor parameters.
- Adjusted formatting for run method parameters.
- Adjusted formatting for process.stderr.write call.
- Adjusted formatting for map function call.
- Adjusted formatting for retryCount parameter.
- Adjusted formatting for saveSuccess and saveFailure method parameters.
- Adjusted formatting for model.name.replace calls.
- Adjusted formatting for fs.writeFileSync calls.
specification/v0_10/eval/src/index.ts
- Adjusted formatting for generateSummary function parameters.
- Adjusted formatting for string padding in summary table headers.
- Adjusted formatting for reduce function call.
- Adjusted formatting for filter conditions.
- Added new summary lines for 'Schema successful runs' and 'Total successful eval runs'.
- Adjusted formatting for toFixed call.
- Adjusted formatting for filter and map calls.
- Adjusted formatting for promptPrefixes.some call.
- Adjusted formatting for logger error message.
- Removed extensive comments regarding cleaning results and logger setup.
- Clarified comments for logger configuration and summary saving.
- Adjusted formatting for path.join and logger.warn calls.
- Adjusted formatting for generator.run call.
- Adjusted formatting for filter conditions.
- Adjusted formatting for issues.map call.
specification/v0_10/eval/src/logger.ts
- Adjusted formatting for winston.format.printf calls.
- Adjusted formatting for winston.format.json call.
specification/v0_10/eval/src/models.ts
- Updated gemini-3-flash requestsPerMinute from 50 to 1000.
- Updated gemini-3-pro model to gemini-3.1-pro-preview and name to gemini-3.1-pro.
- Updated gemini-3.1-pro requestsPerMinute from 50 to 25.
- Updated gemini-2.5-flash-lite tokensPerMinute from 1200000 to 4000000.
specification/v0_10/eval/src/rateLimiter.ts
- Adjusted formatting for state.usageRecords.filter call.
- Adjusted formatting for logger verbose message.
- Adjusted formatting for acquirePermit parameters.
- Adjusted formatting for logger debug message.
- Adjusted formatting for oldestRequest.timestamp calculation.
- Adjusted formatting for record.timestamp calculation.
specification/v0_10/eval/src/validator.ts
- Added import for ajv-formats.
- Added addFormats to the AJV instance.
- Removed comment about schemas being keyed by filename.
- Removed comment about phase 2 being fast.
- Removed redundant 'AJV Validation' comment.
- Implemented detailed error handling for 'oneOf' cases in schema validation, including targeted validation for components and filtering of original errors.
- Removed comment about validating unknown functions.
specification/v0_10/json/basic_catalog.json
- Corrected JSON schema format from dateTime to date-time for date-time properties.
specification/v0_9/eval/package.json
- Updated 'evalGemini' script to use 'gemini-3-flash' model.
- Updated various devDependencies and dependencies to newer versions.
specification/v0_9/eval/pnpm-lock.yaml
- Updated numerous package resolutions and integrity hashes to reflect new dependency versions.
specification/v0_9/eval/src/ai.ts
- Adjusted formatting for googleAI model configuration.
specification/v0_9/eval/src/analysis_flow.ts
- Adjusted formatting for z.object schema definition.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_9/eval/src/evaluation_flow.ts
- Removed a blank line at the beginning of the file.
- Adjusted formatting for z.object schema definitions.
- Removed comments related to estimated tokens and model config lookup.
- Clarified comments regarding eval model config lookup and safe defaults.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_9/eval/src/evaluator.ts
- Adjusted formatting for constructor parameters.
- Adjusted formatting for filter conditions.
- Adjusted formatting for logger info messages.
- Adjusted formatting for map function call.
- Adjusted formatting for setTimeout call.
- Adjusted formatting for yaml.dump and fs.writeFileSync calls.
specification/v0_9/eval/src/generation_flow.ts
- Adjusted formatting for rateLimiter.acquirePermit call.
- Adjusted formatting for logger error message.
- Updated comments regarding token usage reconciliation.
- Adjusted formatting for rateLimiter.recordUsage call.
- Adjusted formatting for the ai.defineFlow call.
specification/v0_9/eval/src/generator.ts
- Adjusted formatting for constructor parameters.
- Adjusted formatting for run method parameters.
- Adjusted formatting for process.stderr.write call.
- Adjusted formatting for map function call.
- Adjusted formatting for retryCount parameter.
- Adjusted formatting for saveSuccess and saveFailure method parameters.
- Adjusted formatting for model.name.replace calls.
- Adjusted formatting for fs.writeFileSync calls.
specification/v0_9/eval/src/index.ts
- Adjusted formatting for generateSummary function parameters.
- Adjusted formatting for string padding in summary table headers.
- Adjusted formatting for reduce function call.
- Adjusted formatting for filter conditions.
- Added new summary lines for 'Schema successful runs' and 'Total successful eval runs'.
- Adjusted formatting for toFixed call.
- Adjusted formatting for filter and map calls.
- Adjusted formatting for promptPrefixes.some call.
- Adjusted formatting for logger error message.
- Removed extensive comments regarding cleaning results and logger setup.
- Clarified comments for logger configuration and summary saving.
- Adjusted formatting for path.join and logger.warn calls.
- Adjusted formatting for generator.run call.
- Adjusted formatting for filter conditions.
- Adjusted formatting for issues.map call.
specification/v0_9/eval/src/logger.ts
- Adjusted formatting for winston.format.printf calls.
- Adjusted formatting for winston.format.json call.
specification/v0_9/eval/src/models.ts
- Updated gemini-3-flash requestsPerMinute from 50 to 1000.
- Updated gemini-3-pro model to gemini-3.1-pro-preview and name to gemini-3.1-pro.
- Updated gemini-3.1-pro requestsPerMinute from 50 to 25.
- Updated gemini-2.5-flash-lite tokensPerMinute from 1200000 to 4000000.
specification/v0_9/eval/src/rateLimiter.ts
- Adjusted formatting for state.usageRecords.filter call.
- Adjusted formatting for logger verbose message.
- Adjusted formatting for acquirePermit parameters.
- Adjusted formatting for logger debug message.
- Adjusted formatting for oldestRequest.timestamp calculation.
- Adjusted formatting for record.timestamp calculation.
specification/v0_9/eval/src/validator.ts
- Added import for ajv-formats.
- Added addFormats to the AJV instance.
- Removed comment about schemas being keyed by filename.
- Removed comment about phase 2 being fast.
- Removed redundant 'AJV Validation' comment.
- Implemented detailed error handling for 'oneOf' cases in schema validation, including targeted validation for components and filtering of original errors.
- Removed comment about validating unknown functions.
specification/v0_9/json/basic_catalog.json
- Corrected JSON schema format from dateTime to date-time for date-time properties.

Adds Gemini 3.1 Pro and updates the rate and token limits for all Gemini models to match the latest API specifications.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gspencergoog added 3 commits February 24, 2026 12:39

Update eval script AI models and limits

cfb38d8

Adds Gemini 3.1 Pro and updates the rate and token limits for all Gemini models to match the latest API specifications.

Cleaning up evals.

ddf55cd

Update v0_10 too

8930807

gspencergoog requested a review from jacobsimionato as a code owner February 25, 2026 00:34

github-project-automation bot added this to A2UI Feb 25, 2026

github-project-automation bot moved this to Todo in A2UI Feb 25, 2026

This comment was marked as resolved.

Sign in to view

google deleted a comment from gemini-code-assist bot Feb 25, 2026

gspencergoog and others added 3 commits February 24, 2026 17:26

Remove any

f0463f4

Update specification/v0_10/eval/src/validator.ts

0467f15

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

prevent infinite recursion

9087550

google deleted a comment from gemini-code-assist bot Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Eval maintenance#712

Eval maintenance#712
gspencergoog wants to merge 6 commits intogoogle:mainfrom
gspencergoog:update_evals

gspencergoog commented Feb 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

gspencergoog commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of Changes

Highlights

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gspencergoog commented Feb 25, 2026 •

edited

Loading