Skip to content

feat - e2e tests + evals #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ npx @smithery/cli install linear-mcp-server --client claude
}
```



## Running evals

The evals package loads an mcp client that then runs the index.ts file, so there is no need to rebuild between tests. You can load environment variables by prefixing the npx command. Full documentation can be found [here](https://www.mcpevals.io/docs).

```bash
OPENAI_API_KEY=your-key npx mcp-eval evals.ts index.ts
```
## Components

### Tools
Expand Down
59 changes: 59 additions & 0 deletions evals.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
//evals.ts

import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction } from "mcp-evals";

const linear_create_issueEval: EvalFunction = {
name: "Linear Create Issue Tool Evaluation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation functions use inconsistent naming formats for their name properties. For example, 'Linear Create Issue Tool Evaluation' (line 8) uses title casing and includes 'Tool Evaluation', whereas 'linear_update_issue Evaluation' (line 17), 'linear_search_issues' (line 26), 'linear_get_user_issues Evaluation' (line 35), and 'linear_add_comment Evaluation' (line 44) vary in capitalization and inclusion of the word 'Evaluation'. For better readability and consistency, consider standardizing these names.

description: "Evaluates the correctness and completeness of the linear_create_issue tool usage",
run: async () => {
const result = await grade(openai("gpt-4"), "Please create a new Linear issue using the linear_create_issue tool. The issue should have the title 'Fix login bug for Safari users', teamId 'team123', description 'Safari users are unable to log in properly', priority 2, and status 'Open'. Return the issue identifier and URL.");
return JSON.parse(result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling (e.g. try/catch) around JSON.parse to gracefully handle unexpected non-JSON responses.

}
};

const linear_update_issueEval: EvalFunction = {
name: "linear_update_issue Evaluation",
description: "Evaluates the tool's ability to update existing issue details in a Linear project",
run: async () => {
const result = await grade(openai("gpt-4"), "Please update the existing Linear issue with ID 'ISS-786' to have the title 'New Title', description 'Revised description for clarity', priority 3, and status 'In Review'.");
return JSON.parse(result);
}
};

const linear_search_issuesEval: EvalFunction = {
name: "linear_search_issues",
description: "Evaluates the linear_search_issues tool for searching issues based on flexible criteria",
run: async () => {
const result = await grade(openai("gpt-4"), "Find any open issues assigned to user 'usr_123' that mention 'bug' in their title or description, have a priority of 2, and return no more than 5 results while ignoring archived issues.");
return JSON.parse(result);
}
};

const linear_get_user_issuesEval: EvalFunction = {
name: "linear_get_user_issues Evaluation",
description: "Evaluates the correctness of retrieving user issues including optional parameters",
run: async () => {
const result = await grade(openai("gpt-4"), "Retrieve the assigned issues for user 123, including archived issues, limited to 10.");
return JSON.parse(result);
}
};

const linear_add_commentEval: EvalFunction = {
name: "linear_add_comment Evaluation",
description: "Evaluates the correctness of the linear_add_comment functionality",
run: async () => {
const result = await grade(openai("gpt-4"), "Please add a comment to the Linear issue with ID ABC123. The comment should be in markdown format saying: 'Testing comment functionality!'. Use a custom user name 'TestUser' and avatar 'http://test.avatar.com'. Return the created comment's details including its URL.");
return JSON.parse(result);
}
};

const config: EvalConfig = {
model: openai("gpt-4"),
evals: [linear_create_issueEval, linear_update_issueEval, linear_search_issuesEval, linear_get_user_issuesEval, linear_add_commentEval]
};

export default config;

export const evals = [linear_create_issueEval, linear_update_issueEval, linear_search_issuesEval, linear_get_user_issuesEval, linear_add_commentEval];
Loading