alibaba · jwx0925 · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
@@ -42,7 +42,7 @@
 **skill-up** is a CLI evaluation framework for Agent Skill developers. Declare your eval environment, dependencies, test cases, and grading strategy in `evals/eval.yaml` and `evals/cases/*.yaml`, then run evaluations locally or in CI to generate structured reports.
 
 > [!WARNING]
-> The core business logic of this repository is implemented, but the project is still in an **early evolution** stage: the code is not yet fully stable, and some CLI commands, configuration fields, and public APIs may still change in future releases. Please review the [CHANGELOG](CHANGELOG.md) and verify compatibility before using it in production.
+> This project is still in an **early evolution** stage: the code is not yet fully stable, and some CLI commands, configuration fields, and public APIs may still change in future releases. Please review the [CHANGELOG](CHANGELOG.md) and verify compatibility before using it in production.
 
 ## Features
 
@@ -53,22 +53,26 @@
 - **Anthropic Compatible**: Import `evals.json` via `skill-up import`, or auto-detect with `--auto`.
 - **CI-Ready**: Designed for local development and continuous integration pipelines.
 
-## Requirements
+## Why skill-up
 
-- [Go](https://go.dev/dl/) 1.25 or later — required for building and running the CLI.
+The official [Agent Skills evaluation guide](https://agentskills.io/skill-creation/evaluating-skills) describes the right evaluation loop: write realistic cases, run with and without the Skill, grade outputs, aggregate results, and iterate. `skill-up` turns that workflow into a reusable CLI:
+
+- Replaces ad hoc run folders with a declarative `eval.yaml` + `cases/*.yaml` format.
+- Automates workspace setup, Skill installation, Agent Engine invocation, judging, and report generation.
+- Supports multiple engines (`claude_code`, `codex`, `qodercli`) instead of tying the workflow to one client.
+- Keeps compatibility with Anthropic-style `evals.json` while adding richer judges, CI-friendly commands, and structured reports.
 
 ## Installation
 
-**From source:**
+Install with the script:
 
 ```bash
-go install github.com/alibaba/skill-up/cmd/skill-up@latest
+curl -fsSL https://raw.githubusercontent.com/alibaba/skill-up/main/install.sh | bash
 ```
 
-**Prebuilt binaries:**
-Download from [GitHub Releases](https://github.com/alibaba/skill-up/releases).
+The installer downloads the matching binary from [GitHub Releases](https://github.com/alibaba/skill-up/releases).
 
-**Build locally:**
+To build locally from a checkout, install [Go](https://go.dev/dl/) 1.25 or later:
 
 ```bash
 make build
@@ -88,32 +92,23 @@ schema_version: v1alpha1
 environment:
   type: none
 
-skills:
-  - source: local_path
-    path: .
-
 engine:
   name: claude_code
 
 cases:
   files:
     - evals/cases/hello-world.yaml
-  defaults:
-    timeout_seconds: 120
-    max_turns: 5
-
-report:
-  formats: [json]
 ```
 
-### 2. Write a Test Case
+When `evals/eval.yaml` lives under a directory that contains `SKILL.md`, skill-up installs the current Skill automatically. The omitted fields use defaults: JSON report output, `timeout_seconds: 300`, `max_turns: 10`, and `parallelism: 1`.
+
+For the full `eval.yaml` schema, see [Writing Evals](docs/guide/writing-evals.md).
+
+### 2. Write an Eval Case
 
 Create `evals/cases/hello-world.yaml`:
 
 ```yaml
-id: hello-world
-title: Skill should respond to basic requests
-
 input:
   prompt: |
     Please generate a Hello World program
@@ -122,24 +117,22 @@ expect:
   must_contain:
     - "Hello"
     - "World"
-
-judge:
-  type: rule_based
-  success:
-    - output_contains:
-        all: ["Hello", "World"]
 ```
 
+The case `id` defaults to the filename (`hello-world`). Add a `judge` block only when you need script-based or agent-based grading.
+
 ### 3. Validate Config
 
 ```bash
-skill-up validate ./evals/eval.yaml
+skill-up validate
 ```
 
+This step is optional, but useful before the first run: it checks `eval.yaml` and all referenced case files without starting an Agent Engine.
+
 ### 4. Run Evaluation
 
 ```bash
-skill-up run ./evals/eval.yaml
+skill-up run
 ```
 
 Results are written to `<skill-name>-workspace/iteration-1/`.
@@ -229,33 +222,6 @@ skill-up import ./evals/evals.json --output ./evals
 | `skill-up debug judge <input.json>` | Debug judge module with a JSON input |
 | `skill-up debug report <input.json>` | Debug report module with a JSON input |
 
-## Project Structure
-
-```text
-skill-up/
-├── cmd/skill-up/          # CLI entrypoint
-├── internal/              # Private implementation
-│   ├── cli/               # Cobra commands
-│   ├── config/            # YAML config loader & validator
-│   ├── credential/        # API key & credential resolution
-│   ├── runtime/           # Workspace runtime (none / opensandbox)
-│   ├── agent/             # Agent Engine adapters
-│   ├── judge/             # Evaluation judges
-│   ├── report/            # Report generators (JSON / JUnit / HTML)
-│   └── runner/            # End-to-end orchestration
-├── pkg/transcript/        # Public transcript parsing API
-├── docs/                  # VitePress documentation site
-│   ├── .vitepress/        # VitePress config
-│   ├── guide/             # English user guide
-│   ├── zh/                # Chinese user guide
-│   └── public/            # Static assets (logo, etc.)
-├── e2e/                   # End-to-end tests
-├── examples/              # Example fixtures and scripts
-├── Makefile               # Build & quality targets
-├── go.mod / go.sum        # Go module dependencies
-└── README.md              # This file
-```
-
 ## License
 
 Apache License 2.0 — see [LICENSE](LICENSE).
@@ -42,7 +42,7 @@
 **skill-up** 是面向 Agent Skill 开发者的 CLI 评测框架。在 Skill 包内通过 `evals/eval.yaml` 与 `evals/cases/*.yaml` 声明评测环境、依赖、用例与评估方式，在本地或 CI 中运行评测并生成结构化报告。
 
 > [!WARNING]
-> 本仓库核心业务逻辑已经实现，但整体仍处于 **早期演进阶段**：代码尚未完全稳定，部分 CLI 命令、配置字段以及公共 API 在后续版本中仍有可能调整。请在生产环境使用前关注 [CHANGELOG](CHANGELOG.md) 并做好兼容性验证。
+> 本项目仍处于 **早期演进阶段**：代码尚未完全稳定，部分 CLI 命令、配置字段以及公共 API 在后续版本中仍有可能调整。请在生产环境使用前关注 [CHANGELOG](CHANGELOG.md) 并做好兼容性验证。
 
 ## 特性
 
@@ -53,22 +53,26 @@
 - **Anthropic 兼容**：通过 `skill-up import` 导入 `evals.json`，或使用 `--auto` 自动识别。
 - **CI 就绪**：专为本地开发和持续集成流水线设计。
 
-## 环境要求
+## 为什么需要 skill-up
 
-- [Go](https://go.dev/dl/) 1.25 或更高版本 — 构建和运行 CLI 所需。
+官方的 [Agent Skills 评测指南](https://agentskills.io/skill-creation/evaluating-skills) 说明了正确的评测循环：编写真实用例，分别运行 with/without Skill，评分输出，汇总结果，然后持续迭代。`skill-up` 的价值是把这套流程产品化成一个可复用的 CLI：
+
+- 用声明式的 `eval.yaml` + `cases/*.yaml` 取代临时拼出来的运行目录。
+- 自动完成 workspace 准备、Skill 安装、Agent Engine 调用、评分和报告生成。
+- 支持多个引擎（`claude_code`、`codex`、`qodercli`），不绑定单一客户端。
+- 兼容 Anthropic 风格的 `evals.json`，同时提供更丰富的 judge、适合 CI 的命令和结构化报告。
 
 ## 安装
 
-**源码安装：**
+使用安装脚本：
 
 ```bash
-go install github.com/alibaba/skill-up/cmd/skill-up@latest
+curl -fsSL https://raw.githubusercontent.com/alibaba/skill-up/main/install.sh | bash
 ```
 
-**预编译二进制：**
-从 [GitHub Releases](https://github.com/alibaba/skill-up/releases) 下载。
+安装脚本会从 [GitHub Releases](https://github.com/alibaba/skill-up/releases) 下载当前平台对应的二进制文件。
 
-**本地构建：**
+如需从仓库 checkout 后本地构建，需要安装 [Go](https://go.dev/dl/) 1.25 或更高版本：
 
 ```bash
 make build
@@ -88,32 +92,23 @@ schema_version: v1alpha1
 environment:
   type: none
 
-skills:
-  - source: local_path
-    path: .
-
 engine:
   name: claude_code
 
 cases:
   files:
     - evals/cases/hello-world.yaml
-  defaults:
-    timeout_seconds: 120
-    max_turns: 5
-
-report:
-  formats: [json]
 ```
 
-### 第二步：编写评测用例
+当 `evals/eval.yaml` 位于包含 `SKILL.md` 的目录下时，skill-up 会自动安装当前 Skill。未写出的字段会使用默认值：JSON 报告、`timeout_seconds: 300`、`max_turns: 10`、`parallelism: 1`。
+
+完整的 `eval.yaml` 配置说明见 [编写评测配置与用例](docs/zh/guide/writing-evals.md)。
+
+### 第二步：编写 Eval Case
 
 创建 `evals/cases/hello-world.yaml`：
 
 ```yaml
-id: hello-world
-title: Skill 应该正确响应基本请求
-
 input:
   prompt: |
     请帮我生成一个 Hello World 程序
@@ -122,24 +117,22 @@ expect:
   must_contain:
     - "Hello"
     - "World"
-
-judge:
-  type: rule_based
-  success:
-    - output_contains:
-        all: ["Hello", "World"]
 ```
 
+用例 `id` 默认取文件名（这里是 `hello-world`）。只有在需要脚本评测或 Agent 评测时，才需要额外添加 `judge` 配置。
+
 ### 第三步：校验配置
 
 ```bash
-skill-up validate ./evals/eval.yaml
+skill-up validate
 ```
 
+这一步是可选的，但建议首次运行前执行：它只检查 `eval.yaml` 和引用的用例文件，不会启动 Agent Engine。
+
 ### 第四步：运行评测
 
 ```bash
-skill-up run ./evals/eval.yaml
+skill-up run
 ```
 
 评测结果将写入 `<skill-name>-workspace/iteration-1/` 目录。
@@ -162,33 +155,6 @@ skill-up import ./evals/evals.json --output ./evals
 | `skill-up debug judge <input.json>` | 使用 JSON 输入调试 judge 模块 |
 | `skill-up debug report <input.json>` | 使用 JSON 输入调试 report 模块 |
 
-## 项目结构
-
-```text
-skill-up/
-├── cmd/skill-up/          # CLI 入口
-├── internal/              # 私有实现
-│   ├── cli/               # Cobra 命令
-│   ├── config/            # YAML 配置加载与校验
-│   ├── credential/        # API Key 与凭证解析
-│   ├── runtime/           # 工作区运行时（none / opensandbox）
-│   ├── agent/             # Agent 引擎适配层
-│   ├── judge/             # 评估评分器
-│   ├── report/            # 报告生成器（JSON / JUnit / HTML）
-│   └── runner/            # 端到端编排
-├── pkg/transcript/        # 公共 transcript 解析 API
-├── docs/                  # VitePress 文档站点
-│   ├── .vitepress/        # VitePress 配置
-│   ├── guide/             # 英文用户指南
-│   ├── zh/                # 中文用户指南
-│   └── public/            # 静态资源（logo 等）
-├── e2e/                   # 端到端测试
-├── examples/              # 示例 fixture 与脚本
-├── Makefile               # 构建与质量目标
-├── go.mod / go.sum        # Go 模块依赖
-└── README.md              # 英文说明文档
-```
-
 ## 许可证
 
 Apache License 2.0 — 详见 [LICENSE](LICENSE)。