You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Rename README to DOC_CI_TEST_README and update parser to skip it
- Rename README.md to DOC_CI_TEST_README.md to clarify it documents the CI test framework
- Update parser.py to skip DOC_CI_TEST_README.md when parsing markdown files
- Remove warning about escaped backticks (no longer needed since file is skipped)
- Add notes that server is NOT restarted between multiple aiperf commands
Addresses PR feedback to have test script ignore this documentation file
instead of using escaped backticks in examples.
Signed-off-by: Ganesh Kudleppanavar <[email protected]>
# Adding New End-to-End Tests for Documentation Examples
7
7
8
-
## IMPORTANT: Code Examples in This File
9
-
10
-
**The bash code examples in this documentation use backslashes (`\`) before the triple backticks** to prevent them from being parsed as actual test commands by the test framework.
11
-
12
-
**When copying examples from this file, you MUST remove the backslashes (`\`) before using them.**
13
-
14
-
For example, this file shows examples like `\```bash` but you should write ````bash` (without the backslash).
15
-
16
-
---
17
-
18
8
This guide explains how to add new end-to-end tests for server examples in the AIPerf documentation.
19
9
20
10
## Overview
@@ -46,16 +36,14 @@ To add tests for a new server, you need to add three types of tagged commands to
46
36
47
37
Tag the bash command that starts your server:
48
38
49
-
```markdown
50
39
<!-- setup-myserver-endpoint-server -->
51
-
\```bash
40
+
```bash
52
41
# Start your server
53
42
docker run --gpus all -p 8000:8000 myserver/image:latest \
54
43
--model my-model \
55
44
--host 0.0.0.0 --port 8000
56
-
\```
57
-
<!-- /setup-myserver-endpoint-server -->
58
45
```
46
+
<!-- /setup-myserver-endpoint-server -->
59
47
60
48
**Important notes:**
61
49
- The server name (`myserver` in this example) must be consistent across all three tag types
@@ -67,13 +55,11 @@ docker run --gpus all -p 8000:8000 myserver/image:latest \
67
55
68
56
Tag a bash command that waits for your server to be ready:
You can have multiple `aiperf-run` commands for the same server. Each will be executed sequentially:
87
+
You can have multiple `aiperf-run` commands for the same server. Each will be executed sequentially against the same running server instance (the server is NOT restarted between commands):
104
88
105
-
```markdown
106
89
<!-- aiperf-run-myserver-endpoint-server -->
107
-
\```bash
90
+
```bash
108
91
# First test: streaming mode
109
92
aiperf profile \
110
93
--model my-model \
@@ -113,58 +96,57 @@ aiperf profile \
113
96
--service-kind openai \
114
97
--streaming \
115
98
--num-prompts 10
116
-
\```
99
+
```
117
100
<!-- /aiperf-run-myserver-endpoint-server -->
118
101
119
102
<!-- aiperf-run-myserver-endpoint-server -->
120
-
\```bash
103
+
```bash
121
104
# Second test: non-streaming mode
122
105
aiperf profile \
123
106
--model my-model \
124
107
--endpoint-type chat \
125
108
--endpoint /v1/chat/completions \
126
109
--service-kind openai \
127
110
--num-prompts 10
128
-
\```
129
-
<!-- /aiperf-run-myserver-endpoint-server -->
130
111
```
112
+
<!-- /aiperf-run-myserver-endpoint-server -->
131
113
132
114
**Important notes:**
133
115
- Do NOT include `--ui-type` flag - the test framework adds `--ui-type simple` automatically
134
116
- Each command is executed inside the AIPerf Docker container
135
117
- Commands should complete in a reasonable time (default timeout: 300 seconds)
136
118
- Use small values for `--num-prompts` and `--max-tokens` to keep tests fast
119
+
- The server is NOT restarted between multiple aiperf commands - all commands run against the same server instance
137
120
138
121
## Complete Example
139
122
140
123
Here's a complete example for a new server called "fastapi":
141
124
142
-
```markdown
143
125
### Running FastAPI Server
144
126
145
127
Start the FastAPI server:
146
128
147
129
<!-- setup-fastapi-endpoint-server -->
148
-
\```bash
130
+
```bash
149
131
docker run --gpus all -p 8000:8000 mycompany/fastapi-llm:latest \
150
132
--model-name meta-llama/Llama-3.2-1B \
151
133
--host 0.0.0.0 \
152
134
--port 8000
153
-
\```
135
+
```
154
136
<!-- /setup-fastapi-endpoint-server -->
155
137
156
138
Wait for the server to be ready:
157
139
158
140
<!-- health-check-fastapi-endpoint-server -->
159
-
\```bash
141
+
```bash
160
142
timeout 600 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done'|| { echo"FastAPI server not ready after 10min";exit 1; }
161
-
\```
143
+
```
162
144
<!-- /health-check-fastapi-endpoint-server -->
163
145
164
146
Profile the model:
165
147
166
148
<!-- aiperf-run-fastapi-endpoint-server -->
167
-
\```bash
149
+
```bash
168
150
aiperf profile \
169
151
--model meta-llama/Llama-3.2-1B \
170
152
--endpoint-type chat \
@@ -173,9 +155,8 @@ aiperf profile \
173
155
--streaming \
174
156
--num-prompts 20 \
175
157
--max-tokens 50
176
-
\```
177
-
<!-- /aiperf-run-fastapi-endpoint-server -->
178
158
```
159
+
<!-- /aiperf-run-fastapi-endpoint-server -->
179
160
180
161
## Running the Tests
181
162
@@ -216,30 +197,31 @@ For each server, the test runner:
216
197
1.**Build Phase**: Builds the AIPerf Docker container (once for all tests)
217
198
2.**Setup Phase**: Starts the server in the background
218
199
3.**Health Check Phase**: Waits for server to be ready (runs in parallel with setup)
219
-
4.**Test Phase**: Executes all AIPerf commands sequentially
200
+
4.**Test Phase**: Executes all AIPerf commands sequentially against the same running server instance
220
201
5.**Cleanup Phase**: Gracefully shuts down the server and cleans up Docker resources
221
202
203
+
**Note**: The server remains running throughout all AIPerf commands. It is only shut down once during the cleanup phase after all tests complete.
204
+
222
205
## Common Patterns
223
206
224
207
### Pattern: OpenAI-compatible API
225
208
226
-
```markdown
227
209
<!-- setup-myserver-endpoint-server -->
228
-
\```bash
210
+
```bash
229
211
docker run --gpus all -p 8000:8000 myserver:latest \
0 commit comments