Skip to content

Commit 8c91e8a

Browse files
author
张仕洋
committed
Merge branch 'main' of github.com:torchpipe/torchpipe.github.io
2 parents 88690c7 + 5cb0be4 commit 8c91e8a

File tree

3 files changed

+23
-23
lines changed

3 files changed

+23
-23
lines changed

docs/quick_start_new_user.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ img = self.precls_trans(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), (224,22
7777
```py
7878

7979
input_shape = torch.ones((1, 3, 224, 224)).cuda()
80-
self.classification_engine = torch2trt(resnet50, [input_shape],
80+
self.classification_engine = torch2trt(resnet50, [input_shape],
8181
fp16_mode=self.fp16,
8282
max_batch_size=self.cls_trt_max_batchsize,
8383
)
@@ -116,8 +116,8 @@ config = torchpipe.parse_toml("resnet50.toml")
116116
self.classification_engine = pipe(config)
117117

118118
self.classification_engine(bin_data)
119-
120-
119+
120+
121121
if TASK_RESULT_KEY not in bin_data.keys():
122122
print("error decode")
123123
return results
@@ -133,9 +133,9 @@ The contents of the toml file are as follows:
133133

134134
```bash
135135
# Schedule'parameter
136-
batching_timeout = 5
137-
instance_num = 8
138-
precision = "fp16"
136+
batching_timeout = 5
137+
instance_num = 8
138+
precision = "fp16"
139139

140140
## Data decoding
141141
#
@@ -145,11 +145,11 @@ precision = "fp16"
145145
# The original decoding output format was BGR
146146
# The DecodeMat backend also defaults to outputting in BGR format
147147
# Since decoding is done on the CPU, DecodeMat is used
148-
# After each node is completed, the name of the next node needs to be
148+
# After each node is completed, the name of the next node needs to be
149149
# appended, otherwise the last node is assumed by default
150150
#
151151
[cpu_decoder]
152-
backend = "DecodeMat"
152+
backend = "DecodeMat"
153153
next = "cpu_posdecoder"
154154

155155
## preprocessing: resize、cvtColorMat
@@ -160,11 +160,11 @@ next = "cpu_posdecoder"
160160
# Note:
161161
# The original preprocessing order was resize, cv2.COLOR_BGR2RGB,
162162
# then Normalize.
163-
# However, the normalization step is now integrated into the model
164-
# processing (the [resnet50] node), so the output result after the
165-
# preprocessing in this node is consistent with the preprocessing result
163+
# However, the normalization step is now integrated into the model
164+
# processing (the [resnet50] node), so the output result after the
165+
# preprocessing in this node is consistent with the preprocessing result
166166
# without normalization.
167-
# After each node is completed, the name of the next node needs to be
167+
# After each node is completed, the name of the next node needs to be
168168
# appended, otherwise the last node is assumed by default.
169169
#
170170
[cpu_posdecoder]
@@ -183,23 +183,23 @@ next = "resnet50"
183183
#
184184
# This corresponds to 3.1(3) TensorRT acceleration and 3.1(2)Normalize
185185
# Note:
186-
# There's a slight difference from the original method of generating
187-
# engines online. Here, the model needs to be first converted to ONNX
186+
# There's a slight difference from the original method of generating
187+
# engines online. Here, the model needs to be first converted to ONNX
188188
# format.
189-
#
189+
#
190190
# For the conversion method, see [Converting Torch to ONNX].
191191
#
192192
[resnet50]
193-
backend = "SyncTensor[TensorrtTensor]"
193+
backend = "SyncTensor[TensorrtTensor]"
194194
min = 1
195195
max = 4
196196
instance_num = 4
197-
model = "/you/model/path/resnet50.onnx"
197+
model = "/you/model/path/resnet50.onnx"
198198

199199
mean="123.675, 116.28, 103.53" # 255*"0.485, 0.456, 0.406"
200200
std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
201201

202-
# TensorrtTensor
202+
# TensorrtTensor
203203
"model::cache"="/you/model/path/resnet50.trt" # or resnet50.trt.encrypted
204204

205205
```
@@ -221,7 +221,7 @@ std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
221221

222222
The specific test code can be found at [client_qps.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/client_qps.py)
223223

224-
With the same Thrift service interface, testing on a machine with NIDIA-3080 GPU, 36-core CPU, and concurrency of 10, we have the following results:
224+
With the same Thrift service interface, testing on a machine with NVIDIA-3080 GPU, 36-core CPU, and concurrency of 10, we have the following results:
225225

226226
- throughput:
227227

@@ -233,7 +233,7 @@ With the same Thrift service interface, testing on a machine with NIDIA-3080 GPU
233233
- response time:
234234

235235
| Methods | TP50 | TP99 |
236-
:-: | :-: | :-:|
236+
:-: | :-: | :-:|
237237
| Pure TensorRT | 26.74 |35.24|
238238
| Using TorchPipe |8.89|14.28|
239239

i18n/zh/docusaurus-plugin-content-docs/current/introduction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ type: explainer
1111

1212
业界有一些实践,如[triton inference server](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models), [阿里妈妈high_service](https://mp.weixin.qq.com/s/Fd2GNXqO3wl3FrA7Wli3jA), [美团视觉GPU推理服务部署架构优化实践](https://zhuanlan.zhihu.com/p/605094862)
1313

14-
通常用户对于Trinton Inference Server的一个抱怨是,在多个节点交织的系统中,需要在客户端完成大量业务逻辑,并通过RPC调用服务端,很麻烦;而为了性能考虑,不得不考虑共享显存、ensemble、[自定义业务逻辑(Business Logic Scripting)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)等非常规手段。
14+
通常用户对于Triton Inference Server的一个抱怨是,在多个节点交织的系统中,需要在客户端完成大量业务逻辑,并通过RPC调用服务端,很麻烦;而为了性能考虑,不得不考虑共享显存、ensemble、[自定义业务逻辑(Business Logic Scripting)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)等非常规手段。
1515

1616
为了解决这个问题,TorchPipe通过深入PyTorch的C++计算后端和CUDA流管理,以及针对多节点的领域特定语言建模,对外提供面向PyTorch前端的线程安全函数接口,对内提供面向用户的细粒度后端扩展。
1717

1818

1919
![jpg](.././static/images/EngineFlow-light.png)
20-
<center>torchpipe框架图</center>
20+
<center>torchpipe框架图</center>
2121

2222
**torchpipe框架特点:**
2323
- 性能(峰值吞吐/TP99)上达到业务角度上的近乎最优,减少广泛存在的负优化和节点间性能损耗。

i18n/zh/docusaurus-plugin-content-docs/current/welcome.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ type: explainer
77
# 欢迎查看 torchpipe 文档!
88
torchpipe是一个独立作用于底层加速库(如tensorrt,opencv, CVCUDA,torchscript)以及 RPC(如thrift, gRPC)之间的多实例流水线并行库。在满足时延前提下最大限度挖掘服务吞吐能力。
99

10-
整个方案集并发安全和全链路流水线调度等特点于一身,支持NIDIA硬件平台, 兼顾了开发效率与性能提速的特点。
10+
整个方案集并发安全和全链路流水线调度等特点于一身,支持 NVIDIA 硬件平台, 兼顾了开发效率与性能提速的特点。
1111

1212
torchpipe 对外提供面向 PyTorch 前端的线程安全函数接口,对内则提供面向用户的细粒度后端扩展。
1313

0 commit comments

Comments
 (0)