feat - e2e tests + evals #10

mclenhard · 2025-05-02T16:08:09Z

Adds new e2e test that loads an MCP client, which in turn runs the server and processes the actual tool call. Afterwards, it then grades the response for correctness.

note: I'm the package author and i translated this from english

Summary by Sourcery

Add end-to-end (e2e) evaluation tests for documentation retrieval tools using MCP evals framework

New Features:

Implemented e2e evaluation tests for documentation retrieval tools
Added support for running evaluations using MCP evals package

Documentation:

Updated README with instructions for running evals

Tests:

Created evaluation tests for get_latest_docs, search_docs, and get_api_reference tools

sourcery-ai · 2025-05-02T16:08:13Z

Reviewer's Guide

This pull request introduces end-to-end testing by adding the mcp-evals dependency and implementing evaluation functions in src/evals/evals.ts. These functions use the grade utility from mcp-evals and an OpenAI model to test specific tool functionalities based on natural language prompts, with results parsed from JSON. The README is updated with instructions on running these evaluations.

File-Level Changes

Change	Details	Files
Integrate `mcp-evals` dependency and define evaluation structure.	Add `mcp-evals` as a project dependency. Create `src/evals/evals.ts` to define evaluation configuration (`EvalConfig`) and individual test functions (`EvalFunction`). Utilize the `grade` function from `mcp-evals` with an OpenAI model (`gpt-4`) to run evaluations.	`package.json` `package-lock.json` `src/evals/evals.ts`
Implement specific evaluation test cases for tool functions.	Define multiple `EvalFunction` instances targeting `get_latest_docs`, `search_docs`, and `get_api_reference` tools. Craft specific natural language prompts for each evaluation case. Parse the JSON string returned by the `grade` function.	`src/evals/evals.ts`
Update documentation with evaluation execution instructions.	Add a new 'Running evals' section. Provide the `npx mcp-eval` command for running tests. Mention the requirement and usage of environment variables (e.g., `OPENAI_API_KEY`).	`README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @mclenhard - I've reviewed your changes - here's some feedback:

The evals list in src/evals/evals.ts appears to contain duplicate entries for get_latest_docsEval and mixes search_docsEval with search_docs.
Consider standardizing the naming convention for evaluation functions in src/evals/evals.ts (e.g., consistently using the Eval suffix).
The separate export of the evals array in src/evals/evals.ts might be redundant as it's already part of the exported config.

Here's what I looked at during the review

🟡 General issues: 2 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-02T16:09:31Z

src/evals/evals.ts

+import { openai } from "@ai-sdk/openai";
+import { grade, EvalFunction } from "mcp-evals";
+
+const get_latest_docsEval: EvalFunction = {


issue (bug_risk): Duplicate definition of get_latest_docsEval may cause unintended overrides.

The two get_latest_docsEval definitions share the same name, so the second overwrites the first. Rename one to avoid collisions and preserve both.

sourcery-ai · 2025-05-02T16:09:31Z

src/evals/evals.ts

+
+const config: EvalConfig = {
+    model: openai("gpt-4"),
+    evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs]


suggestion (bug_risk): Duplicate get_latest_docsEval appears twice in the evals array.

get_latest_docsEval is listed twice, so the second entry overwrites the first and can cause unexpected behavior. Please remove or rename the duplicate so each eval is unique.

Suggested change

evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs]

evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, search_docs]

sourcery-ai · 2025-05-02T16:09:31Z

src/evals/evals.ts

+
+import { EvalConfig } from 'mcp-evals';
+import { openai } from "@ai-sdk/openai";
+import { grade, EvalFunction } from "mcp-evals";


issue (complexity): Consider creating a helper function to encapsulate the common logic in the eval definitions.

You can reduce duplication by abstracting the common logic into a helper function. For example, try:

function createEval(name: string, description: string, query: string): EvalFunction { return { name, description, run: async () => { const result = await grade(openai("gpt-4"), query); return JSON.parse(result); } }; }

Then update your eval definitions as follows:

const get_latest_docsEval1 = createEval( 'get_latest_docs Evaluation', 'Evaluates get_latest_docs tool functionality', "Please provide the latest Solana documentation for the 'developing' section." ); const search_docsEval = createEval( 'search_docs Evaluation', 'Evaluates how well the search_docs tool handles queries about Solana documentation', "Where can I find documentation on creating tokens in Solana and is there a code snippet for it?" ); const get_api_referenceEval = createEval( 'Get API Reference Evaluation', 'Evaluates the retrieval of Solana SDK API reference details', "Please provide detailed information about the 'transaction' item in the Solana SDK." ); const get_latest_docsEval2 = createEval( 'get_latest_docs Evaluation', 'Evaluates the retrieval of the requested Solana docs section', "Please retrieve the latest details from the 'economics' section of the Solana documentation." ); const search_docs = createEval( 'search_docs', 'Evaluates the search_docs tool functionality', "Where in the Solana documentation can I find instructions on issuing new tokens?" );

Finally, update your config array accordingly (only include duplicate if intentionally needed):

const config: EvalConfig = { model: openai("gpt-4"), evals: [get_latest_docsEval1, search_docsEval, get_api_referenceEval, get_latest_docsEval2, search_docs] }; export default config; export const evals = [get_latest_docsEval1, search_docsEval, get_api_referenceEval, get_latest_docsEval2, search_docs];

This refactoring consolidates the HTTP call and JSON parsing logic, reduces duplicate code, and simplifies further modifications.

devloai

PR Summary:

This PR adds end-to-end (e2e) testing and evaluation capabilities to the Solana documentation MCP server. It introduces a new evaluation framework using the mcp-evals package that loads an MCP client, runs the server, and tests tool functionality with actual calls. The implementation includes evaluation tests for all three main tools: get_latest_docs, search_docs, and get_api_reference, along with documentation on how to run the evaluations.

Review Summary:

I've identified several critical issues in the implementation that need addressing before merging. These include duplicate function declarations, missing dependencies, and potential runtime errors due to configuration issues. Once these issues are fixed, the e2e testing framework will provide significant value in ensuring the reliability of the Solana documentation tools.

Follow-up suggestions:

@devloai fix the identified issues in the evals.ts file

devloai · 2025-05-02T16:10:47Z

src/evals/evals.ts

+};
+
+const get_latest_docsEval: EvalFunction = {
+    name: 'get_latest_docs Evaluation',


There's a duplicate declaration of get_latest_docsEval. This function is defined twice with different implementations (first on line 7, then again on line 35), which will cause confusion and potential runtime errors. Remove one of the implementations or rename one of them to have a unique identifier.

devloai · 2025-05-02T16:10:47Z

src/evals/evals.ts

+};
+
+const search_docs: EvalFunction = {
+    name: 'search_docs',


The function search_docs lacks the Eval suffix in its name, unlike the other evaluation functions. This inconsistency in naming could lead to confusion. Rename it to search_docsEval to maintain consistent naming convention across all evaluation functions.

devloai · 2025-05-02T16:10:47Z

src/evals/evals.ts

+
+const config: EvalConfig = {
+    model: openai("gpt-4"),
+    evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs]


The config includes duplicate evaluations. The evals array contains both instances of get_latest_docsEval and both search_docsEval and search_docs. Ensure each evaluation is only included once in the configuration to prevent duplicate test runs.

devloai · 2025-05-02T16:10:47Z

src/evals/evals.ts

+//evals.ts
+
+import { EvalConfig } from 'mcp-evals';
+import { openai } from "@ai-sdk/openai";


Missing dependency in package.json for @ai-sdk/openai. The code imports and uses the openai function, but the package isn't listed as a dependency. Add this package to package.json to ensure the evals can run properly.

feat - tests + evals

c5a1abd

sourcery-ai bot approved these changes May 2, 2025

View reviewed changes

devloai bot reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat - e2e tests + evals #10

feat - e2e tests + evals #10

Uh oh!

mclenhard commented May 2, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented May 2, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot May 2, 2025

Uh oh!

sourcery-ai bot May 2, 2025

Uh oh!

sourcery-ai bot May 2, 2025

Uh oh!

devloai bot left a comment

Uh oh!

devloai bot May 2, 2025

Uh oh!

devloai bot May 2, 2025

Uh oh!

devloai bot May 2, 2025

Uh oh!

devloai bot May 2, 2025

Uh oh!

Uh oh!

	evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs]
	evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, search_docs]

feat - e2e tests + evals #10

Are you sure you want to change the base?

feat - e2e tests + evals #10

Uh oh!

Conversation

mclenhard commented May 2, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

devloai bot left a comment

Choose a reason for hiding this comment

PR Summary:

Review Summary:

Follow-up suggestions:

Uh oh!

devloai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

devloai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

devloai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

devloai bot May 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mclenhard commented May 2, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented May 2, 2025 •

edited

Loading