🚀 OrqKit

Tools from Orq AI for building robust AI evaluation pipelines, online or offline. The monorepo contains utilities for running evaluations, building with LLMs while optionally integrating with the orq.ai platform.

🎯 Why OrqKit?

The Problem: Testing LLM applications is hard. You need to:

Run evaluations across multiple prompts and models
Track performance over time
Ensure model updates don't break existing functionality
Integrate evaluation into CI/CD pipelines

The Solution: OrqKit provides tools to:

Evaluate at Scale - Run parallel evaluations across datasets with built-in retry logic
Test Like You Deploy - Use the same evaluation framework locally and in CI/CD
Measure What Matters - Pre-built evaluators for common LLM metrics (coming soon)
Track Results - Automatic result tracking when connected to the orq platform, otherwise build it to your own dashboard

🌟 About Orq AI

Orq AI is a platform for building, deploying, and monitoring AI applications. We believe in providing developers with powerful, open-source tools that integrate seamlessly with our platform while remaining useful as standalone utilities.

📦 Packages

This monorepo contains the following open-source packages:

Package	Description	Docs
`@orq-ai/evaluatorq`	Core evaluation framework with Effect-based architecture for running parallel AI evaluations	README
`@orq-ai/evaluators`	Reusable evaluators for AI evaluation frameworks	README
`@orq-ai/cli`	Command-line interface for discovering and running evaluation files	README
`@orq-ai/vercel-provider`	Vercel AI SDK provider for seamless integration with Orq AI platform	README
`@orq-ai/n8n-nodes-orq`	n8n community nodes for integrating Orq AI deployments and knowledge bases	README
`@orq-ai/tiny-di`	Minimal dependency injection container with TypeScript support	README

🚀 Quick Start

Install Packages

# Install the core evaluation framework
npm install @orq-ai/evaluatorq

# Install the CLI globally (optional)
npm install -g @orq-ai/cli

# Install the Vercel AI SDK provider
npm install @orq-ai/vercel-provider

Create Your First Evaluation

// example-llm.eval.ts
import Anthropic from "@anthropic-ai/sdk";
import { type DataPoint, evaluatorq, job } from "@orq/evaluatorq";

import { containsNameValidator, isItPoliteLLMEval } from "../evals.js";

const claude = new Anthropic();

const greet = job("greet", async (data: DataPoint) => {
  const output = await claude.messages.create({
    stream: false,
    max_tokens: 100,
    model: "claude-3-5-haiku-latest",
    system: `For testing purposes please be really lazy and sarcastic in your response, not polite at all.`,
    messages: [
      {
        role: "user",
        content: `Hello My name is ${data.inputs.name}`,
      },
    ],
  });

  // LLM response: *sighs dramatically* Oh great, another Bob. Let me guess, you want me to care about something? Fine. Hi, Bob. What do you want?

  return output.content[0].type === "text" ? output.content[0].text : "";
});

await evaluatorq("dataset-evaluation", {
  data: [
    { inputs: { name: "Alice" } },
    { inputs: { name: "Bob" } },
    Promise.resolve({ inputs: { name: "Márk" } }),
  ],
  jobs: [greet],
  evaluators: [containsNameValidator, isItPoliteLLMEval],
  parallelism: 2,
  print: true,
});

Run It

# Using the CLI
orq evaluate example-llm.eval.ts

# Or directly with a runtime
bun run example-llm.eval.ts

Output

orq evaluate ./examples/src/lib/cli/example-llm.eval.ts
Running evaluations:

⚡ Running example-llm.eval.ts...
⠏ Evaluating results 3/3 (100%) - Running evaluator: is-it-polite

EVALUATION RESULTS

Summary:
┌──────────────────────┬─────────────────┐
│ Metric               │ Value           │
├──────────────────────┼─────────────────┤
│ Total Data Points    │ 3               │
├──────────────────────┼─────────────────┤
│ Failed Data Points   │ 0               │
├──────────────────────┼─────────────────┤
│ Total Jobs           │ 3               │
├──────────────────────┼─────────────────┤
│ Failed Jobs          │ 0               │
├──────────────────────┼─────────────────┤
│ Success Rate         │ 100%            │
└──────────────────────┴─────────────────┘

Detailed Results:
┌──────────────────────────┬────────────────────────┐
│ Evaluators               │ greet                  │
├──────────────────────────┼────────────────────────┤
│ contains-name            │ 100.0%                 │
├──────────────────────────┼────────────────────────┤
│ is-it-polite             │ 0.08                   │
└──────────────────────────┴────────────────────────┘

💡 Tip: Use print:false to get raw JSON results.

✔ ✓ Evaluation completed successfully

✅ example-llm.eval.ts completed

🔗 Integration with Orq Platform

While our tools work great standalone, they shine when integrated with the Orq AI platform:

Dataset Management: Store and version your evaluation datasets
Result Tracking: Track evaluation results over time
Team Collaboration: Share evaluations and results with your team
API Integration: Use your Orq API key to access platform features

// Using Orq platform datasets
await evaluatorq("platform-eval", {
  data: {
    datasetId: "your-dataset-id", // From Orq platform
  },
  jobs: [...],
  evaluators: [...],
});

Use Vercel AI SDK Provider

// ai-integration.ts
import { createOrqAiProvider } from "@orq-ai/vercel-provider";
import { generateText } from "ai";

const orq = createOrqAiProvider({
  apiKey: process.env.ORQ_API_KEY,
});

const { text } = await generateText({
  model: orq("gpt-4"),
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(text);

🛠️ Development

This is an Nx-based monorepo using Bun as the package manager.

# Clone the repository
git clone https://github.com/orq-ai/orqkit.git
cd orqkit

# Install dependencies
bun install

# Build all packages
bunx nx build evaluatorq
bunx nx build cli
bunx nx build vercel-provider

# Run examples
cd examples
bun run src/lib/dataset-example.ts

📚 Documentation

Evaluatorq Documentation - Core evaluation framework
CLI Documentation - Command-line interface
Vercel Provider Documentation - Vercel AI SDK provider
Examples - Sample evaluation implementations
Orq AI Platform Docs - Platform documentation

🤝 Contributing

We welcome contributions! Whether it's bug fixes, new features, or documentation improvements, please feel free to make a pull request.

📦 Releases

We release all packages to npm using nx under one version number.

# Publish the packages using nx. this will run the release workflow, increment the version, build the libraries and publish the packages to npm.
# check the docs for more details: https://nx.dev/recipes/nx-release/release-npm-packages
nx release

Have an idea?

Create an issue: If you have ideas for improvements or new features, please create an issue to discuss it
Check the roadmap: Take a look at our public roadmap to see what we're working on and what's planned

Built with ❤️ by Orq AI
Website • Documentation • GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
.vscode		.vscode
examples		examples
packages		packages
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
nx.json		nx.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 OrqKit

🎯 Why OrqKit?

🌟 About Orq AI

📦 Packages

🚀 Quick Start

Install Packages

Create Your First Evaluation

Run It

Output

🔗 Integration with Orq Platform

Use Vercel AI SDK Provider

🛠️ Development

📚 Documentation

🤝 Contributing

📦 Releases

Have an idea?

About

Uh oh!

Releases 6

Uh oh!

Contributors 4

Uh oh!

Languages

License

orq-ai/orqkit

Folders and files

Latest commit

History

Repository files navigation

🚀 OrqKit

🎯 Why OrqKit?

🌟 About Orq AI

📦 Packages

🚀 Quick Start

Install Packages

Create Your First Evaluation

Run It

Output

🔗 Integration with Orq Platform

Use Vercel AI SDK Provider

🛠️ Development

📚 Documentation

🤝 Contributing

📦 Releases

Have an idea?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 4

Uh oh!

Languages