Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 23 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
**skill-up** is a CLI evaluation framework for Agent Skill developers. Declare your eval environment, dependencies, test cases, and grading strategy in `evals/eval.yaml` and `evals/cases/*.yaml`, then run evaluations locally or in CI to generate structured reports.

> [!WARNING]
> The core business logic of this repository is implemented, but the project is still in an **early evolution** stage: the code is not yet fully stable, and some CLI commands, configuration fields, and public APIs may still change in future releases. Please review the [CHANGELOG](CHANGELOG.md) and verify compatibility before using it in production.
> This project is still in an **early evolution** stage: the code is not yet fully stable, and some CLI commands, configuration fields, and public APIs may still change in future releases. Please review the [CHANGELOG](CHANGELOG.md) and verify compatibility before using it in production.

## Features

Expand All @@ -53,22 +53,26 @@
- **Anthropic Compatible**: Import `evals.json` via `skill-up import`, or auto-detect with `--auto`.
- **CI-Ready**: Designed for local development and continuous integration pipelines.

## Requirements
## Why skill-up

- [Go](https://go.dev/dl/) 1.25 or later — required for building and running the CLI.
The official [Agent Skills evaluation guide](https://agentskills.io/skill-creation/evaluating-skills) describes the right evaluation loop: write realistic cases, run with and without the Skill, grade outputs, aggregate results, and iterate. `skill-up` turns that workflow into a reusable CLI:

- Replaces ad hoc run folders with a declarative `eval.yaml` + `cases/*.yaml` format.
- Automates workspace setup, Skill installation, Agent Engine invocation, judging, and report generation.
- Supports multiple engines (`claude_code`, `codex`, `qodercli`) instead of tying the workflow to one client.
- Keeps compatibility with Anthropic-style `evals.json` while adding richer judges, CI-friendly commands, and structured reports.

## Installation

**From source:**
Install with the script:

```bash
go install github.com/alibaba/skill-up/cmd/skill-up@latest
curl -fsSL https://raw.githubusercontent.com/alibaba/skill-up/main/install.sh | bash
```

**Prebuilt binaries:**
Download from [GitHub Releases](https://github.com/alibaba/skill-up/releases).
The installer downloads the matching binary from [GitHub Releases](https://github.com/alibaba/skill-up/releases).

**Build locally:**
To build locally from a checkout, install [Go](https://go.dev/dl/) 1.25 or later:

```bash
make build
Expand All @@ -88,32 +92,23 @@ schema_version: v1alpha1
environment:
type: none

skills:
- source: local_path
path: .

engine:
name: claude_code

cases:
files:
- evals/cases/hello-world.yaml
defaults:
timeout_seconds: 120
max_turns: 5

report:
formats: [json]
```

### 2. Write a Test Case
When `evals/eval.yaml` lives under a directory that contains `SKILL.md`, skill-up installs the current Skill automatically. The omitted fields use defaults: JSON report output, `timeout_seconds: 300`, `max_turns: 10`, and `parallelism: 1`.

For the full `eval.yaml` schema, see [Writing Evals](docs/guide/writing-evals.md).

### 2. Write an Eval Case

Create `evals/cases/hello-world.yaml`:

```yaml
id: hello-world
title: Skill should respond to basic requests

input:
prompt: |
Please generate a Hello World program
Expand All @@ -122,24 +117,22 @@ expect:
must_contain:
- "Hello"
- "World"

judge:
type: rule_based
success:
- output_contains:
all: ["Hello", "World"]
```

The case `id` defaults to the filename (`hello-world`). Add a `judge` block only when you need script-based or agent-based grading.

### 3. Validate Config

```bash
skill-up validate ./evals/eval.yaml
skill-up validate
```

This step is optional, but useful before the first run: it checks `eval.yaml` and all referenced case files without starting an Agent Engine.

### 4. Run Evaluation

```bash
skill-up run ./evals/eval.yaml
skill-up run
```

Results are written to `<skill-name>-workspace/iteration-1/`.
Expand Down Expand Up @@ -229,33 +222,6 @@ skill-up import ./evals/evals.json --output ./evals
| `skill-up debug judge <input.json>` | Debug judge module with a JSON input |
| `skill-up debug report <input.json>` | Debug report module with a JSON input |

## Project Structure

```text
skill-up/
├── cmd/skill-up/ # CLI entrypoint
├── internal/ # Private implementation
│ ├── cli/ # Cobra commands
│ ├── config/ # YAML config loader & validator
│ ├── credential/ # API key & credential resolution
│ ├── runtime/ # Workspace runtime (none / opensandbox)
│ ├── agent/ # Agent Engine adapters
│ ├── judge/ # Evaluation judges
│ ├── report/ # Report generators (JSON / JUnit / HTML)
│ └── runner/ # End-to-end orchestration
├── pkg/transcript/ # Public transcript parsing API
├── docs/ # VitePress documentation site
│ ├── .vitepress/ # VitePress config
│ ├── guide/ # English user guide
│ ├── zh/ # Chinese user guide
│ └── public/ # Static assets (logo, etc.)
├── e2e/ # End-to-end tests
├── examples/ # Example fixtures and scripts
├── Makefile # Build & quality targets
├── go.mod / go.sum # Go module dependencies
└── README.md # This file
```

## License

Apache License 2.0 — see [LICENSE](LICENSE).
80 changes: 23 additions & 57 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
**skill-up** 是面向 Agent Skill 开发者的 CLI 评测框架。在 Skill 包内通过 `evals/eval.yaml` 与 `evals/cases/*.yaml` 声明评测环境、依赖、用例与评估方式,在本地或 CI 中运行评测并生成结构化报告。

> [!WARNING]
> 本仓库核心业务逻辑已经实现,但整体仍处于 **早期演进阶段**:代码尚未完全稳定,部分 CLI 命令、配置字段以及公共 API 在后续版本中仍有可能调整。请在生产环境使用前关注 [CHANGELOG](CHANGELOG.md) 并做好兼容性验证。
> 本项目仍处于 **早期演进阶段**:代码尚未完全稳定,部分 CLI 命令、配置字段以及公共 API 在后续版本中仍有可能调整。请在生产环境使用前关注 [CHANGELOG](CHANGELOG.md) 并做好兼容性验证。

## 特性

Expand All @@ -53,22 +53,26 @@
- **Anthropic 兼容**:通过 `skill-up import` 导入 `evals.json`,或使用 `--auto` 自动识别。
- **CI 就绪**:专为本地开发和持续集成流水线设计。

## 环境要求
## 为什么需要 skill-up

- [Go](https://go.dev/dl/) 1.25 或更高版本 — 构建和运行 CLI 所需。
官方的 [Agent Skills 评测指南](https://agentskills.io/skill-creation/evaluating-skills) 说明了正确的评测循环:编写真实用例,分别运行 with/without Skill,评分输出,汇总结果,然后持续迭代。`skill-up` 的价值是把这套流程产品化成一个可复用的 CLI:

- 用声明式的 `eval.yaml` + `cases/*.yaml` 取代临时拼出来的运行目录。
- 自动完成 workspace 准备、Skill 安装、Agent Engine 调用、评分和报告生成。
- 支持多个引擎(`claude_code`、`codex`、`qodercli`),不绑定单一客户端。
- 兼容 Anthropic 风格的 `evals.json`,同时提供更丰富的 judge、适合 CI 的命令和结构化报告。

## 安装

**源码安装:**
使用安装脚本:

```bash
go install github.com/alibaba/skill-up/cmd/skill-up@latest
curl -fsSL https://raw.githubusercontent.com/alibaba/skill-up/main/install.sh | bash
```

**预编译二进制:**
从 [GitHub Releases](https://github.com/alibaba/skill-up/releases) 下载。
安装脚本会从 [GitHub Releases](https://github.com/alibaba/skill-up/releases) 下载当前平台对应的二进制文件。

**本地构建:**
如需从仓库 checkout 后本地构建,需要安装 [Go](https://go.dev/dl/) 1.25 或更高版本:

```bash
make build
Expand All @@ -88,32 +92,23 @@ schema_version: v1alpha1
environment:
type: none

skills:
- source: local_path
path: .

engine:
name: claude_code

cases:
files:
- evals/cases/hello-world.yaml
defaults:
timeout_seconds: 120
max_turns: 5

report:
formats: [json]
```

### 第二步:编写评测用例
当 `evals/eval.yaml` 位于包含 `SKILL.md` 的目录下时,skill-up 会自动安装当前 Skill。未写出的字段会使用默认值:JSON 报告、`timeout_seconds: 300`、`max_turns: 10`、`parallelism: 1`。

完整的 `eval.yaml` 配置说明见 [编写评测配置与用例](docs/zh/guide/writing-evals.md)。

### 第二步:编写 Eval Case

创建 `evals/cases/hello-world.yaml`:

```yaml
id: hello-world
title: Skill 应该正确响应基本请求

input:
prompt: |
请帮我生成一个 Hello World 程序
Expand All @@ -122,24 +117,22 @@ expect:
must_contain:
- "Hello"
- "World"

judge:
type: rule_based
success:
- output_contains:
all: ["Hello", "World"]
```

用例 `id` 默认取文件名(这里是 `hello-world`)。只有在需要脚本评测或 Agent 评测时,才需要额外添加 `judge` 配置。

### 第三步:校验配置

```bash
skill-up validate ./evals/eval.yaml
skill-up validate
```

这一步是可选的,但建议首次运行前执行:它只检查 `eval.yaml` 和引用的用例文件,不会启动 Agent Engine。

### 第四步:运行评测

```bash
skill-up run ./evals/eval.yaml
skill-up run
```

评测结果将写入 `<skill-name>-workspace/iteration-1/` 目录。
Expand All @@ -162,33 +155,6 @@ skill-up import ./evals/evals.json --output ./evals
| `skill-up debug judge <input.json>` | 使用 JSON 输入调试 judge 模块 |
| `skill-up debug report <input.json>` | 使用 JSON 输入调试 report 模块 |

## 项目结构

```text
skill-up/
├── cmd/skill-up/ # CLI 入口
├── internal/ # 私有实现
│ ├── cli/ # Cobra 命令
│ ├── config/ # YAML 配置加载与校验
│ ├── credential/ # API Key 与凭证解析
│ ├── runtime/ # 工作区运行时(none / opensandbox)
│ ├── agent/ # Agent 引擎适配层
│ ├── judge/ # 评估评分器
│ ├── report/ # 报告生成器(JSON / JUnit / HTML)
│ └── runner/ # 端到端编排
├── pkg/transcript/ # 公共 transcript 解析 API
├── docs/ # VitePress 文档站点
│ ├── .vitepress/ # VitePress 配置
│ ├── guide/ # 英文用户指南
│ ├── zh/ # 中文用户指南
│ └── public/ # 静态资源(logo 等)
├── e2e/ # 端到端测试
├── examples/ # 示例 fixture 与脚本
├── Makefile # 构建与质量目标
├── go.mod / go.sum # Go 模块依赖
└── README.md # 英文说明文档
```

## 许可证

Apache License 2.0 — 详见 [LICENSE](LICENSE)。
Loading
Loading