-
Notifications
You must be signed in to change notification settings - Fork 5
feat - e2e tests + evals #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis pull request introduces end-to-end testing by adding the File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mclenhard - I've reviewed your changes - here's some feedback:
- The
evals
list insrc/evals/evals.ts
appears to contain duplicate entries forget_latest_docsEval
and mixessearch_docsEval
withsearch_docs
. - Consider standardizing the naming convention for evaluation functions in
src/evals/evals.ts
(e.g., consistently using theEval
suffix). - The separate export of the
evals
array insrc/evals/evals.ts
might be redundant as it's already part of the exportedconfig
.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
import { openai } from "@ai-sdk/openai"; | ||
import { grade, EvalFunction } from "mcp-evals"; | ||
|
||
const get_latest_docsEval: EvalFunction = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Duplicate definition of get_latest_docsEval may cause unintended overrides.
The two get_latest_docsEval definitions share the same name, so the second overwrites the first. Rename one to avoid collisions and preserve both.
|
||
const config: EvalConfig = { | ||
model: openai("gpt-4"), | ||
evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Duplicate get_latest_docsEval appears twice in the evals array.
get_latest_docsEval is listed twice, so the second entry overwrites the first and can cause unexpected behavior. Please remove or rename the duplicate so each eval is unique.
evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs] | |
evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, search_docs] |
|
||
import { EvalConfig } from 'mcp-evals'; | ||
import { openai } from "@ai-sdk/openai"; | ||
import { grade, EvalFunction } from "mcp-evals"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider creating a helper function to encapsulate the common logic in the eval definitions.
You can reduce duplication by abstracting the common logic into a helper function. For example, try:
function createEval(name: string, description: string, query: string): EvalFunction {
return {
name,
description,
run: async () => {
const result = await grade(openai("gpt-4"), query);
return JSON.parse(result);
}
};
}
Then update your eval definitions as follows:
const get_latest_docsEval1 = createEval(
'get_latest_docs Evaluation',
'Evaluates get_latest_docs tool functionality',
"Please provide the latest Solana documentation for the 'developing' section."
);
const search_docsEval = createEval(
'search_docs Evaluation',
'Evaluates how well the search_docs tool handles queries about Solana documentation',
"Where can I find documentation on creating tokens in Solana and is there a code snippet for it?"
);
const get_api_referenceEval = createEval(
'Get API Reference Evaluation',
'Evaluates the retrieval of Solana SDK API reference details',
"Please provide detailed information about the 'transaction' item in the Solana SDK."
);
const get_latest_docsEval2 = createEval(
'get_latest_docs Evaluation',
'Evaluates the retrieval of the requested Solana docs section',
"Please retrieve the latest details from the 'economics' section of the Solana documentation."
);
const search_docs = createEval(
'search_docs',
'Evaluates the search_docs tool functionality',
"Where in the Solana documentation can I find instructions on issuing new tokens?"
);
Finally, update your config array accordingly (only include duplicate if intentionally needed):
const config: EvalConfig = {
model: openai("gpt-4"),
evals: [get_latest_docsEval1, search_docsEval, get_api_referenceEval, get_latest_docsEval2, search_docs]
};
export default config;
export const evals = [get_latest_docsEval1, search_docsEval, get_api_referenceEval, get_latest_docsEval2, search_docs];
This refactoring consolidates the HTTP call and JSON parsing logic, reduces duplicate code, and simplifies further modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary:
This PR adds end-to-end (e2e) testing and evaluation capabilities to the Solana documentation MCP server. It introduces a new evaluation framework using the mcp-evals package that loads an MCP client, runs the server, and tests tool functionality with actual calls. The implementation includes evaluation tests for all three main tools: get_latest_docs, search_docs, and get_api_reference, along with documentation on how to run the evaluations.
Review Summary:
I've identified several critical issues in the implementation that need addressing before merging. These include duplicate function declarations, missing dependencies, and potential runtime errors due to configuration issues. Once these issues are fixed, the e2e testing framework will provide significant value in ensuring the reliability of the Solana documentation tools.
Follow-up suggestions:
@devloai fix the identified issues in the evals.ts file
}; | ||
|
||
const get_latest_docsEval: EvalFunction = { | ||
name: 'get_latest_docs Evaluation', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a duplicate declaration of get_latest_docsEval
. This function is defined twice with different implementations (first on line 7, then again on line 35), which will cause confusion and potential runtime errors. Remove one of the implementations or rename one of them to have a unique identifier.
}; | ||
|
||
const search_docs: EvalFunction = { | ||
name: 'search_docs', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function search_docs
lacks the Eval
suffix in its name, unlike the other evaluation functions. This inconsistency in naming could lead to confusion. Rename it to search_docsEval
to maintain consistent naming convention across all evaluation functions.
|
||
const config: EvalConfig = { | ||
model: openai("gpt-4"), | ||
evals: [get_latest_docsEval, search_docsEval, get_api_referenceEval, get_latest_docsEval, search_docs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config
includes duplicate evaluations. The evals
array contains both instances of get_latest_docsEval
and both search_docsEval
and search_docs
. Ensure each evaluation is only included once in the configuration to prevent duplicate test runs.
//evals.ts | ||
|
||
import { EvalConfig } from 'mcp-evals'; | ||
import { openai } from "@ai-sdk/openai"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dependency in package.json for @ai-sdk/openai. The code imports and uses the openai function, but the package isn't listed as a dependency. Add this package to package.json to ensure the evals can run properly.
Adds new e2e test that loads an MCP client, which in turn runs the server and processes the actual tool call. Afterwards, it then grades the response for correctness.
note: I'm the package author and i translated this from english
Summary by Sourcery
Add end-to-end (e2e) evaluation tests for documentation retrieval tools using MCP evals framework
New Features:
Documentation:
Tests: