Skip to content

Commit 9a24382

Browse files
committed
Add README.md guide with complete examples for doc test CI test scripts.
Include warnings about escaped code blocks to prevent confusion for user copy pasting the sample code blocks. Signed-off-by: Ganesh Kudleppanavar <[email protected]>
1 parent 261a7b1 commit 9a24382

File tree

1 file changed

+315
-0
lines changed

1 file changed

+315
-0
lines changed
Lines changed: 315 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
-->
5+
6+
# Adding New End-to-End Tests for Documentation Examples
7+
8+
## IMPORTANT: Code Examples in This File
9+
10+
**The bash code examples in this documentation use backslashes (`\`) before the triple backticks** to prevent them from being parsed as actual test commands by the test framework.
11+
12+
**When copying examples from this file, you MUST remove the backslashes (`\`) before using them.**
13+
14+
For example, this file shows examples like `\```bash` but you should write `​```bash` (without the backslash).
15+
16+
---
17+
18+
This guide explains how to add new end-to-end tests for server examples in the AIPerf documentation.
19+
20+
## Overview
21+
22+
The end-to-end test framework automatically discovers and tests server examples from markdown documentation files. It:
23+
1. Parses markdown files for specially tagged bash commands
24+
2. Builds an AIPerf Docker container
25+
3. For each discovered server:
26+
- Runs the server setup command
27+
- Waits for the server to become healthy
28+
- Executes AIPerf benchmark commands
29+
- Validates results and cleans up
30+
31+
## How Tests Are Discovered
32+
33+
The test parser (`parser.py`) scans all markdown files (`*.md`) in the repository and looks for HTML comment tags with specific patterns:
34+
35+
- **Setup commands**: `<!-- setup-{server-name}-endpoint-server -->`
36+
- **Health checks**: `<!-- health-check-{server-name}-endpoint-server -->`
37+
- **AIPerf commands**: `<!-- aiperf-run-{server-name}-endpoint-server -->`
38+
39+
Each tag must be followed by a bash code block (` ```bash ... ``` `) containing the actual command.
40+
41+
## Adding a New Server Test
42+
43+
To add tests for a new server, you need to add three types of tagged commands to your documentation:
44+
45+
### 1. Server Setup Command
46+
47+
Tag the bash command that starts your server:
48+
49+
```markdown
50+
<!-- setup-myserver-endpoint-server -->
51+
\```bash
52+
# Start your server
53+
docker run --gpus all -p 8000:8000 myserver/image:latest \
54+
--model my-model \
55+
--host 0.0.0.0 --port 8000
56+
\```
57+
<!-- /setup-myserver-endpoint-server -->
58+
```
59+
60+
**Important notes:**
61+
- The server name (`myserver` in this example) must be consistent across all three tag types
62+
- The setup command runs in the background
63+
- The command should start a long-running server process
64+
- Use port 8000 or ensure your health check targets the correct port
65+
66+
### 2. Health Check Command
67+
68+
Tag a bash command that waits for your server to be ready:
69+
70+
```markdown
71+
<!-- health-check-myserver-endpoint-server -->
72+
\```bash
73+
timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/health -H "Content-Type: application/json")" != "200" ]; do sleep 2; done' || { echo "Server not ready after 15min"; exit 1; }
74+
\```
75+
<!-- /health-check-myserver-endpoint-server -->
76+
```
77+
78+
**Important notes:**
79+
- The health check should poll the server until it responds successfully
80+
- Use a reasonable timeout (e.g., 900 seconds = 15 minutes)
81+
- The command must exit with code 0 when the server is healthy
82+
- The command must exit with non-zero code if the server fails to start
83+
84+
### 3. AIPerf Run Commands
85+
86+
Tag one or more AIPerf benchmark commands:
87+
88+
```markdown
89+
<!-- aiperf-run-myserver-endpoint-server -->
90+
\```bash
91+
aiperf profile \
92+
--model my-model \
93+
--endpoint-type chat \
94+
--endpoint /v1/chat/completions \
95+
--service-kind openai \
96+
--streaming \
97+
--num-prompts 10 \
98+
--max-tokens 100
99+
\```
100+
<!-- /aiperf-run-myserver-endpoint-server -->
101+
```
102+
103+
You can have multiple `aiperf-run` commands for the same server. Each will be executed sequentially:
104+
105+
```markdown
106+
<!-- aiperf-run-myserver-endpoint-server -->
107+
\```bash
108+
# First test: streaming mode
109+
aiperf profile \
110+
--model my-model \
111+
--endpoint-type chat \
112+
--endpoint /v1/chat/completions \
113+
--service-kind openai \
114+
--streaming \
115+
--num-prompts 10
116+
\```
117+
<!-- /aiperf-run-myserver-endpoint-server -->
118+
119+
<!-- aiperf-run-myserver-endpoint-server -->
120+
\```bash
121+
# Second test: non-streaming mode
122+
aiperf profile \
123+
--model my-model \
124+
--endpoint-type chat \
125+
--endpoint /v1/chat/completions \
126+
--service-kind openai \
127+
--num-prompts 10
128+
\```
129+
<!-- /aiperf-run-myserver-endpoint-server -->
130+
```
131+
132+
**Important notes:**
133+
- Do NOT include `--ui-type` flag - the test framework adds `--ui-type simple` automatically
134+
- Each command is executed inside the AIPerf Docker container
135+
- Commands should complete in a reasonable time (default timeout: 300 seconds)
136+
- Use small values for `--num-prompts` and `--max-tokens` to keep tests fast
137+
138+
## Complete Example
139+
140+
Here's a complete example for a new server called "fastapi":
141+
142+
```markdown
143+
### Running FastAPI Server
144+
145+
Start the FastAPI server:
146+
147+
<!-- setup-fastapi-endpoint-server -->
148+
\```bash
149+
docker run --gpus all -p 8000:8000 mycompany/fastapi-llm:latest \
150+
--model-name meta-llama/Llama-3.2-1B \
151+
--host 0.0.0.0 \
152+
--port 8000
153+
\```
154+
<!-- /setup-fastapi-endpoint-server -->
155+
156+
Wait for the server to be ready:
157+
158+
<!-- health-check-fastapi-endpoint-server -->
159+
\```bash
160+
timeout 600 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done' || { echo "FastAPI server not ready after 10min"; exit 1; }
161+
\```
162+
<!-- /health-check-fastapi-endpoint-server -->
163+
164+
Profile the model:
165+
166+
<!-- aiperf-run-fastapi-endpoint-server -->
167+
\```bash
168+
aiperf profile \
169+
--model meta-llama/Llama-3.2-1B \
170+
--endpoint-type chat \
171+
--endpoint /v1/chat/completions \
172+
--service-kind openai \
173+
--streaming \
174+
--num-prompts 20 \
175+
--max-tokens 50
176+
\```
177+
<!-- /aiperf-run-fastapi-endpoint-server -->
178+
```
179+
180+
## Running the Tests
181+
182+
### Run all discovered tests:
183+
184+
```bash
185+
cd tests/ci/test_docs_end_to_end
186+
python main.py
187+
```
188+
189+
### Dry run to see what would be tested:
190+
191+
```bash
192+
python main.py --dry-run
193+
```
194+
195+
### Test specific servers:
196+
197+
Currently, the framework tests the first discovered server by default. Use `--all-servers` to test all:
198+
199+
```bash
200+
python main.py --all-servers
201+
```
202+
203+
## Validation Rules
204+
205+
The test framework validates that each server has:
206+
- Exactly ONE setup command (duplicates cause test failure)
207+
- Exactly ONE health check command (duplicates cause test failure)
208+
- At least ONE aiperf command
209+
210+
If any of these requirements are not met, the tests will fail with a clear error message.
211+
212+
## Test Execution Flow
213+
214+
For each server, the test runner:
215+
216+
1. **Build Phase**: Builds the AIPerf Docker container (once for all tests)
217+
2. **Setup Phase**: Starts the server in the background
218+
3. **Health Check Phase**: Waits for server to be ready (runs in parallel with setup)
219+
4. **Test Phase**: Executes all AIPerf commands sequentially
220+
5. **Cleanup Phase**: Gracefully shuts down the server and cleans up Docker resources
221+
222+
## Common Patterns
223+
224+
### Pattern: OpenAI-compatible API
225+
226+
```markdown
227+
<!-- setup-myserver-endpoint-server -->
228+
\```bash
229+
docker run --gpus all -p 8000:8000 myserver:latest \
230+
--model model-name \
231+
--host 0.0.0.0 --port 8000
232+
\```
233+
<!-- /setup-myserver-endpoint-server -->
234+
235+
<!-- health-check-myserver-endpoint-server -->
236+
\```bash
237+
timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"model-name\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "Server not ready"; exit 1; }
238+
\```
239+
<!-- /health-check-myserver-endpoint-server -->
240+
241+
<!-- aiperf-run-myserver-endpoint-server -->
242+
\```bash
243+
aiperf profile \
244+
--model model-name \
245+
--endpoint-type chat \
246+
--endpoint /v1/chat/completions \
247+
--service-kind openai \
248+
--streaming \
249+
--num-prompts 10 \
250+
--max-tokens 100
251+
\```
252+
<!-- /aiperf-run-myserver-endpoint-server -->
253+
```
254+
255+
## Troubleshooting
256+
257+
### Tests not discovered
258+
259+
- Verify tag format: `setup-{name}-endpoint-server`, `health-check-{name}-endpoint-server`, `aiperf-run-{name}-endpoint-server`
260+
- Ensure bash code block immediately follows the tag
261+
- Check that the server name is consistent across all three tag types
262+
- Run `python main.py --dry-run` to see what's discovered
263+
264+
### Health check timeout
265+
266+
- Increase the timeout value in your health check command
267+
- Verify the health check endpoint is correct
268+
- Check server logs: the test runner shows setup output for 30 seconds
269+
- Ensure your server starts on the expected port
270+
271+
### AIPerf command fails
272+
273+
- Test your AIPerf command manually first
274+
- Use small values for `--num-prompts` and `--max-tokens`
275+
- Verify the model name matches what the server expects
276+
- Check that the endpoint URL is correct
277+
278+
### Duplicate command errors
279+
280+
If you see errors like "DUPLICATE SETUP COMMAND", you have multiple commands with the same server name:
281+
- Search your docs for all instances of that tag
282+
- Ensure each server has a unique name
283+
- Or remove duplicate tags if they're truly duplicates
284+
285+
## Best Practices
286+
287+
1. **Keep tests fast**: Use minimal `--num-prompts` (10-20) and small `--max-tokens` values
288+
2. **Use standard ports**: Default to 8000 for consistency
289+
3. **Add timeouts**: Always include timeouts in health checks
290+
4. **Test locally first**: Run commands manually before adding tags
291+
5. **One server per doc section**: Avoid mixing multiple servers in the same doc section
292+
6. **Clear error messages**: Include helpful error messages in health checks
293+
7. **Document requirements**: Note any GPU, memory, or dependency requirements in surrounding text
294+
295+
## Architecture Reference
296+
297+
Key files in the test framework:
298+
299+
- `main.py`: Entry point, orchestrates parsing and testing
300+
- `parser.py`: Markdown parser that discovers tagged commands
301+
- `test_runner.py`: Executes tests for each server
302+
- `constants.py`: Configuration constants (timeouts, tag patterns)
303+
- `data_types.py`: Data models for commands and servers
304+
- `utils.py`: Utility functions for Docker operations
305+
306+
## Constants and Configuration
307+
308+
Key constants in `constants.py`:
309+
310+
- `SETUP_MONITOR_TIMEOUT`: 30 seconds (how long to monitor setup output)
311+
- `CONTAINER_BUILD_TIMEOUT`: 600 seconds (Docker build timeout)
312+
- `AIPERF_COMMAND_TIMEOUT`: 300 seconds (per-command timeout)
313+
- `AIPERF_UI_TYPE`: "simple" (auto-added to all aiperf commands)
314+
315+
To modify these, edit `constants.py`.

0 commit comments

Comments
 (0)