Skip to content

orq-ai/orqkit

Repository files navigation

πŸš€ OrqKit

Tools from Orq AI for building robust AI evaluation pipelines, online or offline. The monorepo contains utilities for running evaluations, building with LLMs while optionally integrating with the orq.ai platform.

🎯 Why OrqKit?

The Problem: Testing LLM applications is hard. You need to:

  • Run evaluations across multiple prompts and models
  • Track performance over time
  • Ensure model updates don't break existing functionality
  • Integrate evaluation into CI/CD pipelines

The Solution: OrqKit provides tools to:

  • Evaluate at Scale - Run parallel evaluations across datasets with built-in retry logic
  • Test Like You Deploy - Use the same evaluation framework locally and in CI/CD
  • Measure What Matters - Pre-built evaluators for common LLM metrics (coming soon)
  • Track Results - Automatic result tracking when connected to the orq platform, otherwise build it to your own dashboard

🌟 About Orq AI

Orq AI is a platform for building, deploying, and monitoring AI applications. We believe in providing developers with powerful, open-source tools that integrate seamlessly with our platform while remaining useful as standalone utilities.

πŸ“¦ Packages

This monorepo contains the following open-source packages:

Package Description Docs Version
@orq-ai/evaluatorq Core evaluation framework with Effect-based architecture for running parallel AI evaluations README npm
@orq-ai/evaluators Reusable evaluators for AI evaluation frameworks README npm
@orq-ai/cli Command-line interface for discovering and running evaluation files README npm
@orq-ai/vercel-provider Vercel AI SDK provider for seamless integration with Orq AI platform README npm
@orq-ai/n8n-nodes-orq n8n community nodes for integrating Orq AI deployments and knowledge bases README npm
@orq-ai/tiny-di Minimal dependency injection container with TypeScript support README npm

πŸš€ Quick Start

Install Packages

# Install the core evaluation framework
npm install @orq-ai/evaluatorq

# Install the CLI globally (optional)
npm install -g @orq-ai/cli

# Install the Vercel AI SDK provider
npm install @orq-ai/vercel-provider

Create Your First Evaluation

// example-llm.eval.ts
import Anthropic from "@anthropic-ai/sdk";
import { type DataPoint, evaluatorq, job } from "@orq/evaluatorq";

import { containsNameValidator, isItPoliteLLMEval } from "../evals.js";

const claude = new Anthropic();

const greet = job("greet", async (data: DataPoint) => {
  const output = await claude.messages.create({
    stream: false,
    max_tokens: 100,
    model: "claude-3-5-haiku-latest",
    system: `For testing purposes please be really lazy and sarcastic in your response, not polite at all.`,
    messages: [
      {
        role: "user",
        content: `Hello My name is ${data.inputs.name}`,
      },
    ],
  });

  // LLM response: *sighs dramatically* Oh great, another Bob. Let me guess, you want me to care about something? Fine. Hi, Bob. What do you want?

  return output.content[0].type === "text" ? output.content[0].text : "";
});

await evaluatorq("dataset-evaluation", {
  data: [
    { inputs: { name: "Alice" } },
    { inputs: { name: "Bob" } },
    Promise.resolve({ inputs: { name: "MΓ‘rk" } }),
  ],
  jobs: [greet],
  evaluators: [containsNameValidator, isItPoliteLLMEval],
  parallelism: 2,
  print: true,
});

Run It

# Using the CLI
orq evaluate example-llm.eval.ts

# Or directly with a runtime
bun run example-llm.eval.ts

Output

orq evaluate ./examples/src/lib/cli/example-llm.eval.ts
Running evaluations:

⚑ Running example-llm.eval.ts...
⠏ Evaluating results 3/3 (100%) - Running evaluator: is-it-polite

EVALUATION RESULTS

Summary:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Metric               β”‚ Value           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Total Data Points    β”‚ 3               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Failed Data Points   β”‚ 0               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Total Jobs           β”‚ 3               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Failed Jobs          β”‚ 0               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Success Rate         β”‚ 100%            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Detailed Results:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Evaluators               β”‚ greet                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ contains-name            β”‚ 100.0%                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ is-it-polite             β”‚ 0.08                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’‘ Tip: Use print:false to get raw JSON results.

βœ” βœ“ Evaluation completed successfully

βœ… example-llm.eval.ts completed

πŸ”— Integration with Orq Platform

While our tools work great standalone, they shine when integrated with the Orq AI platform:

  • Dataset Management: Store and version your evaluation datasets
  • Result Tracking: Track evaluation results over time
  • Team Collaboration: Share evaluations and results with your team
  • API Integration: Use your Orq API key to access platform features
// Using Orq platform datasets
await evaluatorq("platform-eval", {
  data: {
    datasetId: "your-dataset-id", // From Orq platform
  },
  jobs: [...],
  evaluators: [...],
});

Use Vercel AI SDK Provider

// ai-integration.ts
import { createOrqAiProvider } from "@orq-ai/vercel-provider";
import { generateText } from "ai";

const orq = createOrqAiProvider({
  apiKey: process.env.ORQ_API_KEY,
});

const { text } = await generateText({
  model: orq("gpt-4"),
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(text);

πŸ› οΈ Development

This is an Nx-based monorepo using Bun as the package manager.

# Clone the repository
git clone https://github.com/orq-ai/orqkit.git
cd orqkit

# Install dependencies
bun install

# Build all packages
bunx nx build evaluatorq
bunx nx build cli
bunx nx build vercel-provider

# Run examples
cd examples
bun run src/lib/dataset-example.ts

πŸ“š Documentation

🀝 Contributing

We welcome contributions! Whether it's bug fixes, new features, or documentation improvements, please feel free to make a pull request.

πŸ“¦ Releases

We release all packages to npm using nx under one version number.

# Publish the packages using nx. this will run the release workflow, increment the version, build the libraries and publish the packages to npm.
# check the docs for more details: https://nx.dev/recipes/nx-release/release-npm-packages
nx release

Have an idea?

  • Create an issue: If you have ideas for improvements or new features, please create an issue to discuss it
  • Check the roadmap: Take a look at our public roadmap to see what we're working on and what's planned

Built with ❀️ by Orq AI
Website β€’ Documentation β€’ GitHub

About

Open-source goodies used at orq.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •