stage: v1 wip #623

abhishekg999 · 2025-11-07T02:33:20Z

No description provided.

gemini-code-assist · 2025-11-07T02:33:52Z

Summary of Changes

Hello @abhishekg999, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks a significant step towards a more structured and extensible API for the judgeval library by introducing a new v1 client. It refactors the API generation process, integrates a base_url into client initialization for greater flexibility, and lays the groundwork for advanced features like comprehensive evaluation, diverse scoring mechanisms (built-in, custom, and prompt-based), and OpenTelemetry-driven tracing. These changes aim to enhance the library's modularity and prepare it for future feature development.

Highlights

New V1 API Client Structure: Introduced a new versioned API client (v1) with a dedicated directory structure (src/judgeval/v1), encompassing data models, evaluation, scorers, and tracing functionalities.
Base URL Integration: Modified the API client generation logic and existing client classes (JudgmentSyncClient, JudgmentAsyncClient) to accept and utilize a base_url parameter, making API requests more flexible.
Enhanced API Generation Script: Added a new script (scripts/api_generator_v1.py) specifically for generating the v1 internal API client and its corresponding TypedDict types from an OpenAPI specification, including schema filtering capabilities.
Comprehensive Scorer Framework: Implemented a robust framework for various scorer types, including built-in scorers (e.g., AnswerCorrectnessScorer, FaithfulnessScorer), custom scorers, and prompt-based scorers, each with dedicated factories.
OpenTelemetry-based Tracing: Integrated OpenTelemetry for tracing, providing BaseTracer and Tracer classes to enable span management, attribute setting, and asynchronous evaluation within traced operations.
Flexible Error Handling: Updated the JudgmentAPIError class to allow the response attribute to be optional, improving error handling robustness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant amount of work towards a new v1 SDK, including a new API client generator, data models, and an OpenTelemetry-based tracing system. The overall structure is well-designed, using factories and clear separation of concerns. My review focuses on the new v1 implementation, where I've identified a couple of high-severity bugs related to caching and argument inspection in the tracer, along with some medium-severity style suggestions for the new API generator script to improve maintainability.

src/judgeval/v1/scorers/prompt_scorer/prompt_scorer_factory.py

gemini-code-assist · 2025-11-07T02:36:45Z

src/judgeval/v1/tracer/base_tracer.py

+    def _format_inputs(
+        self, f: Callable[..., Any], args: Tuple[Any, ...], kwargs: Dict[str, Any]
+    ) -> Dict[str, Any]:
+        try:
+            params = list(inspect.signature(f).parameters.values())
+            inputs: Dict[str, Any] = {}
+            arg_i = 0
+            for param in params:
+                if param.kind == inspect.Parameter.POSITIONAL_OR_KEYWORD:
+                    if arg_i < len(args):
+                        inputs[param.name] = args[arg_i]
+                        arg_i += 1
+                    elif param.name in kwargs:
+                        inputs[param.name] = kwargs[param.name]
+                elif param.kind == inspect.Parameter.VAR_POSITIONAL:
+                    inputs[param.name] = args[arg_i:]
+                    arg_i = len(args)
+                elif param.kind == inspect.Parameter.VAR_KEYWORD:
+                    inputs[param.name] = kwargs
+            return inputs
+        except Exception:
+            return {}


The current implementation of _format_inputs is complex and has a bug where it includes self in the captured arguments for instance methods. You can simplify this significantly and make it more robust by using inspect.signature().bind(). This will correctly handle all argument passing styles and allow you to easily remove self or cls from the result.

def _format_inputs( self, f: Callable[..., Any], args: Tuple[Any, ...], kwargs: Dict[str, Any] ) -> Dict[str, Any]: try: bound_args = inspect.signature(f).bind(*args, **kwargs) bound_args.apply_defaults() arguments = bound_args.arguments if "self" in arguments: del arguments["self"] if "cls" in arguments: del arguments["cls"] return arguments except Exception: return {}

gemini-code-assist · 2025-11-07T02:36:45Z

scripts/api_generator_v1.py

+
+
+def filter_schemas() -> Dict[str, Any]:
+    from typing import Generator


This local import of Generator can be moved to the top of the file (line 7) with the other imports from the typing module. This improves code readability and follows standard Python style guidelines.

gemini-code-assist · 2025-11-07T02:36:45Z

scripts/api_generator_v1.py

+            check=True,
+        )
+    finally:
+        import os


The import os statement should be at the top of the file with other imports, following Python's standard conventions. Placing it inside a function or a finally block is unconventional and can be less efficient.

scripts/api_generator_v1.py

propel-code-bot · 2025-11-10T00:43:49Z

✔️ Propel has finished reviewing this change.

stage: v1 wip

2a933e0

abhishekg999 changed the base branch from main to staging November 7, 2025 02:33

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

scripts/api_generator_v1.py Show resolved Hide resolved

abhishekg999 added 14 commits November 6, 2025 18:44

wip

4b2c689

wip

aa556ee

wip

a231775

wip

99e483f

wip

6366e1f

wip

c1cbea8

fix: remove override

66e73c8

fix: cache trace id in key

cb1351c

feat: datasets + run eval

66a67fa

updaet: old types

860ac4f

fix: fireworks linter

b3c9a7c

chore: tests

aec3f2a

chore: tests

fc5fddf

fix: custom scorer

8377d63

abhishekg999 requested review from Mandolaro, adivate2021, alanzhang25 and justinsheu November 9, 2025 08:50

abhishekg999 added 4 commits November 9, 2025 00:51

Merge remote-tracking branch 'origin/staging' into ahh/v1

eff9b3d

merge

3a9f30d

bump

20cf624

static example

e48342d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stage: v1 wip #623

stage: v1 wip #623

Uh oh!

abhishekg999 commented Nov 7, 2025

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

Uh oh!

propel-code-bot bot commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def filter_schemas() -> Dict[str, Any]:
		from typing import Generator

stage: v1 wip #623

Are you sure you want to change the base?

stage: v1 wip #623

Uh oh!

Conversation

abhishekg999 commented Nov 7, 2025

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

propel-code-bot bot commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants