Tools from Orq AI for building robust AI evaluation pipelines, online or offline. The monorepo contains utilities for running evaluations, building with LLMs while optionally integrating with the orq.ai platform.
The Problem: Testing LLM applications is hard. You need to:
- Run evaluations across multiple prompts and models
- Track performance over time
- Ensure model updates don't break existing functionality
- Integrate evaluation into CI/CD pipelines
The Solution: OrqKit provides tools to:
- Evaluate at Scale - Run parallel evaluations across datasets with built-in retry logic
- Test Like You Deploy - Use the same evaluation framework locally and in CI/CD
- Measure What Matters - Pre-built evaluators for common LLM metrics (coming soon)
- Track Results - Automatic result tracking when connected to the orq platform, otherwise build it to your own dashboard
Orq AI is a platform for building, deploying, and monitoring AI applications. We believe in providing developers with powerful, open-source tools that integrate seamlessly with our platform while remaining useful as standalone utilities.
This monorepo contains the following open-source packages:
Package | Description | Docs | Version |
---|---|---|---|
@orq-ai/evaluatorq |
Core evaluation framework with Effect-based architecture for running parallel AI evaluations | README | |
@orq-ai/evaluators |
Reusable evaluators for AI evaluation frameworks | README | |
@orq-ai/cli |
Command-line interface for discovering and running evaluation files | README | |
@orq-ai/vercel-provider |
Vercel AI SDK provider for seamless integration with Orq AI platform | README | |
@orq-ai/n8n-nodes-orq |
n8n community nodes for integrating Orq AI deployments and knowledge bases | README | |
@orq-ai/tiny-di |
Minimal dependency injection container with TypeScript support | README |
# Install the core evaluation framework
npm install @orq-ai/evaluatorq
# Install the CLI globally (optional)
npm install -g @orq-ai/cli
# Install the Vercel AI SDK provider
npm install @orq-ai/vercel-provider
// example-llm.eval.ts
import Anthropic from "@anthropic-ai/sdk";
import { type DataPoint, evaluatorq, job } from "@orq/evaluatorq";
import { containsNameValidator, isItPoliteLLMEval } from "../evals.js";
const claude = new Anthropic();
const greet = job("greet", async (data: DataPoint) => {
const output = await claude.messages.create({
stream: false,
max_tokens: 100,
model: "claude-3-5-haiku-latest",
system: `For testing purposes please be really lazy and sarcastic in your response, not polite at all.`,
messages: [
{
role: "user",
content: `Hello My name is ${data.inputs.name}`,
},
],
});
// LLM response: *sighs dramatically* Oh great, another Bob. Let me guess, you want me to care about something? Fine. Hi, Bob. What do you want?
return output.content[0].type === "text" ? output.content[0].text : "";
});
await evaluatorq("dataset-evaluation", {
data: [
{ inputs: { name: "Alice" } },
{ inputs: { name: "Bob" } },
Promise.resolve({ inputs: { name: "MΓ‘rk" } }),
],
jobs: [greet],
evaluators: [containsNameValidator, isItPoliteLLMEval],
parallelism: 2,
print: true,
});
# Using the CLI
orq evaluate example-llm.eval.ts
# Or directly with a runtime
bun run example-llm.eval.ts
orq evaluate ./examples/src/lib/cli/example-llm.eval.ts
Running evaluations:
β‘ Running example-llm.eval.ts...
β Evaluating results 3/3 (100%) - Running evaluator: is-it-polite
EVALUATION RESULTS
Summary:
ββββββββββββββββββββββββ¬ββββββββββββββββββ
β Metric β Value β
ββββββββββββββββββββββββΌββββββββββββββββββ€
β Total Data Points β 3 β
ββββββββββββββββββββββββΌββββββββββββββββββ€
β Failed Data Points β 0 β
ββββββββββββββββββββββββΌββββββββββββββββββ€
β Total Jobs β 3 β
ββββββββββββββββββββββββΌββββββββββββββββββ€
β Failed Jobs β 0 β
ββββββββββββββββββββββββΌββββββββββββββββββ€
β Success Rate β 100% β
ββββββββββββββββββββββββ΄ββββββββββββββββββ
Detailed Results:
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β Evaluators β greet β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β contains-name β 100.0% β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β is-it-polite β 0.08 β
ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
π‘ Tip: Use print:false to get raw JSON results.
β β Evaluation completed successfully
β
example-llm.eval.ts completed
While our tools work great standalone, they shine when integrated with the Orq AI platform:
- Dataset Management: Store and version your evaluation datasets
- Result Tracking: Track evaluation results over time
- Team Collaboration: Share evaluations and results with your team
- API Integration: Use your Orq API key to access platform features
// Using Orq platform datasets
await evaluatorq("platform-eval", {
data: {
datasetId: "your-dataset-id", // From Orq platform
},
jobs: [...],
evaluators: [...],
});
// ai-integration.ts
import { createOrqAiProvider } from "@orq-ai/vercel-provider";
import { generateText } from "ai";
const orq = createOrqAiProvider({
apiKey: process.env.ORQ_API_KEY,
});
const { text } = await generateText({
model: orq("gpt-4"),
messages: [{ role: "user", content: "Hello!" }],
});
console.log(text);
This is an Nx-based monorepo using Bun as the package manager.
# Clone the repository
git clone https://github.com/orq-ai/orqkit.git
cd orqkit
# Install dependencies
bun install
# Build all packages
bunx nx build evaluatorq
bunx nx build cli
bunx nx build vercel-provider
# Run examples
cd examples
bun run src/lib/dataset-example.ts
- Evaluatorq Documentation - Core evaluation framework
- CLI Documentation - Command-line interface
- Vercel Provider Documentation - Vercel AI SDK provider
- Examples - Sample evaluation implementations
- Orq AI Platform Docs - Platform documentation
We welcome contributions! Whether it's bug fixes, new features, or documentation improvements, please feel free to make a pull request.
We release all packages to npm using nx under one version number.
# Publish the packages using nx. this will run the release workflow, increment the version, build the libraries and publish the packages to npm.
# check the docs for more details: https://nx.dev/recipes/nx-release/release-npm-packages
nx release
- Create an issue: If you have ideas for improvements or new features, please create an issue to discuss it
- Check the roadmap: Take a look at our public roadmap to see what we're working on and what's planned
Built with β€οΈ by Orq AI
Website β’ Documentation β’ GitHub