After you have trained a neural network (or selected already pre-trained one from Explore page) you can apply it within Supervisely on a Project (via "Test" button) or deploy it as API.
We support two methods of API deployment: through Web UI or completely stand alone.
Easiest method, fully managed by Supervisely.
-
Go to the "Neural Networks" page. Click "three dots" icon on the model you want to deploy and select "Deploy". Deployment settings dialog will be opened
-
Here you can setup deployment settings. After clicking "Submit" button and you will be redirected to the Cluster > Tasks page
-
Wait until value in the "output" column will be changed to "Deployed", click on "three dots" and select "Deploy API Info"
-
Here you can see deployed model usage example though CURL or Python. You also can just drag'n'drop image to test your deployed model right in the browser
Choose this method if you want to deploy model in production environment without Supervisely.
Important notice: all steps described below are applicable to all Neural Networks that are integrated to Supervisely platform. In our examples we use YOLO v3 COCO.
Command template:
docker pull <docker image name>
For example:
docker pull supervisely/nn-yolo-v3
docker run --rm -it \
--runtime=nvidia \
-p <port of the host machine>:5000 \
-v '<folder with extracted model weights>:/sly_task_data/model' \
-env GPU_DEVICE=<device_id> \
<model docker image> \
python /workdir/src/rest_inference.py
If your machine has several GPUs, environment variable GPU_DEVICE
is used to explicitly define the device id for model placement. It is the optional field (default value is 0). Especially, this parameter is helpful when you deploy TensorFlow-based models. By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process (link to docs).
For example:
docker run --rm -it \
--runtime=nvidia \
-p 5000:5000 \
-v '/home/ds/Downloads/YOLO v3 (COCO):/sly_task_data/model' \
--env GPU_DEVICE=0 \
supervisely/nn-yolo-v3 \
python /workdir/src/rest_inference.py
Server is started on port 5000 inside a docker container. If you want to change the port, just bind container port 5000 to some other port (e.g. 7777) on your host machine. In such case the command is:
docker run --rm -it \
--runtime=nvidia \
-p 7777:5000 \
-v '/home/ds/Downloads/YOLO v3 (COCO):/sly_task_data/model' \
--env GPU_DEVICE=0 \
supervisely/nn-yolo-v3 \
python /workdir/src/rest_inference.py
There are few ways how you can send requests:
-
Using Supervisely Python SDK. A bunch of examples can be founded in Explore->Notebooks. For example: Guide #04: neural network inference
-
Using CURL and bash (see below)
-
Implement on your favorite language. The model is deployed as HTTP web server, so you can implement client on any language. Python example (without Supervisely-SDK) is provided at the end of this tutorial.
If you want to obtain model classes and tags, there is a special method /model/get_out_meta
. Here is the template:
curl -H "Content-Type: multipart/form-data" -X POST \
-F 'meta=<project meta sting in json format (optional field)>' \
-F 'mode=<inference mode string in json format (optional field)>' \
<ip-address of host machine>:<port of host machine, default 5000>/model/get_out_meta
The intuition behind optional fields meta
and mode
is here. Examples are presented in next section.
All neural networks in Supervisely support several inference modes: full image, sliding window, region of interest (ROI) and bounding boxes mode. Lear more here, here and here
Lets feed the entire image to deployed neural network. Image will be automatically resized to the input resolution of NN. The result will be returned in Supervisely JSON format.
CURL template:
curl -X POST -F "image=@</path/to/image.png>" <ip address if the host machine>:<port of the host machine>/model/inference
For example, let's apply already deployed model (YOLO v3 COCO) to the image ties.jpg
curl -X POST -F "image=@./images/ties.jpg" 0.0.0.0:5000/model/inference
The result is the following:
Input image | Visualized prediction |
---|---|
![]() |
CURL template:
curl -H "Content-Type: multipart/form-data" -X POST \
-F 'mode=<inference mode string in json format (optional field)>' \
-X POST -F "image=@</path/to/image.png>" \
<ip address if the host machine>:<port of the host machine>/model/inference
For example we have a big image. The raw json file sliding_window_mode_example.json
is the following:
{
"name": "sliding_window_det",
"window": {
"width": 1000,
"height": 1000
},
"min_overlap": {
"x": 200,
"y": 200
},
"save": false,
"class_name": "sliding_window_bbox",
"nms_after": {
"enable": true,
"iou_threshold": 0.2,
"confidence_tag_name": "confidence"
},
"model_classes": {
"add_suffix": "_det",
"save_classes": [
"tie"
]
},
"model_tags": {
"add_suffix": "_det",
"save_names": "__all__"
}
}
Here is the explanation:
{
"name": "sliding_window_det",
# Sliding window parameters.
# Width and height in pixels.
# Cannot be larger than the original image.
"window": {
"width": 1000,
"height": 1000,
},
# Minimum overlap for each dimension. The last
# window in every dimension may have higher overlap
# with the previous one if necessary to fit the whole
# window within the original image.
"min_overlap": {
"x": 200,
"y": 200,
},
# Whether to save each sliding window instance as a
# bounding box rectangle.
"save": False,
# If saving the sliding window bounding boxes, which
# class name to use.
"class_name": 'sliding_window_bbox',
"nms_after": {
# Whether to run non-maximum suppression after accumulating
# all the detection results from the sliding windows.
"enable": True,
# Intersection over union threshold above which the same-class
# detection labels are considered to be significantly inersected
# for non-maximum suppression.
"iou_threshold": 0.2,
# Tag name from which to read detection confidence by which we
# rank the detections. This tag must be added by the model to
# every detection label.
"confidence_tag_name": "confidence"
},
# Class renaming and filtering settings.
# See "Full image inference" example for details.
"model_classes": {
"add_suffix": "_model",
"save_classes": ["tie"]
},
"model_tags": {
"add_suffix": "_model",
"save_names": "__all__"
}
}
Let's apply model:
curl -H "Content-Type: multipart/form-data" -X POST -F "image=@./images/big_image.jpg" -F mode="$(cat sliding_window_mode_example.json)" 0.0.0.0:5000/model/inference
Input image | Visualized prediction |
---|---|
NOTICE: if you are going to reproduce this notebook, just copy/paste inference modes from there.
Let's repeat the procedure we've just did with CURL and sliding window mode. Here is the python script that does the job.
import requests
from requests_toolbelt import MultipartEncoder
if __name__ == '__main__':
content_dict = {}
content_dict['image'] = ("big_image.png", open("/workdir/src/big_image.jpg", 'rb'), 'image/*')
content_dict['mode'] = ("mode", open('/workdir/src/sliding_window_mode_example.json', 'rb'))
encoder = MultipartEncoder(fields=content_dict)
response = requests.post("http://0.0.0.0:5000/model/inference", data=encoder, headers={'Content-Type': encoder.content_type})
print(response.json())
Here is the python script to get classes model predicts
import json
import requests
if __name__ == '__main__':
content_dict = {}
# dummy config is used to slightly rename model output class
dummy_detection_config = {
"model_tags": {
"add_suffix": "_det",
"save_names": "__all__"
},
"model_classes": {
"add_suffix": "_det",
"save_classes": "__all__"
}
}
content_dict['mode'] = json.dumps(dummy_detection_config)
response = requests.post("http://0.0.0.0:5000/model/get_out_meta", json=content_dict)
print(response.json())
Just create a Dockerfile in the same directory with NN weights. Something like this:
.
├── Dockerfile
└── model
├── config.json
└── model.pt
Dockerfile content:
FROM supervisely/nn-yolo-v3
COPY model /sly_task_data/model
To build image just execute the following command in the Dockerfile's directory:
docker build -t my_super_image_with_model_inside .
To check that the model is inside our new image my_super_image_with_model_inside
run this command:
docker run --rm -it my_super_image_with_model_inside bash -c 'ls -l /sly_task_data/model'
The command output should look like this:
otal 238432
-rw-r--r-- 1 root root 870 Feb 17 11:57 config.json
-rw-r--r-- 1 root root 244142107 Feb 17 11:57 model.pt
REMINDER: Once you place a model inside docker image you do not need to mount model weights, i.e. docker run
command will be:
docker run --rm -it \
--runtime=nvidia \
-p 5000:5000 \
--env GPU_DEVICE=0 \
my_super_image_with_model_inside \
python /workdir/src/rest_inference.py