Skip to content

DO NOT MERGE: api evals #522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
9 changes: 5 additions & 4 deletions evals/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@
*
* The environment is read from the EVAL_ENV environment variable.
*/
export const env: "BROWSERBASE" | "LOCAL" =
process.env.EVAL_ENV?.toLowerCase() === "browserbase"
? "BROWSERBASE"
: "LOCAL";
// export const env: "BROWSERBASE" | "LOCAL" =
// process.env.EVAL_ENV?.toLowerCase() === "browserbase"
// ? "BROWSERBASE"
// : "LOCAL";
export const env = "BROWSERBASE" as const;

/**
* Enable or disable caching based on the EVAL_ENABLE_CACHING environment variable.
Expand Down
4 changes: 2 additions & 2 deletions evals/index.eval.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ dotenv.config();
*/
const MAX_CONCURRENCY = process.env.EVAL_MAX_CONCURRENCY
? parseInt(process.env.EVAL_MAX_CONCURRENCY, 10)
: 20;
: 5;

const TRIAL_COUNT = process.env.EVAL_TRIAL_COUNT
? parseInt(process.env.EVAL_TRIAL_COUNT, 10)
: 5;
: 1;

/**
* generateSummary:
Expand Down
1 change: 1 addition & 0 deletions evals/initStagehand.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ export const initStagehand = async ({
logger.log(logLine);
},
...configOverrides,
useAPI: true,
};

const stagehand = new Stagehand(config);
Expand Down
2 changes: 1 addition & 1 deletion evals/taskConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ if (filterByEvalName && !tasksByName[filterByEvalName]) {
*/
const DEFAULT_EVAL_MODELS = process.env.EVAL_MODELS
? process.env.EVAL_MODELS.split(",")
: ["gpt-4o", "claude-3-5-sonnet-latest"];
: ["gpt-4o"];

/**
* getModelList:
Expand Down
Loading