Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding changes to yolo plugin #416

Closed
philipp-schmidt opened this issue May 26, 2021 · 8 comments
Closed

Questions regarding changes to yolo plugin #416

philipp-schmidt opened this issue May 26, 2021 · 8 comments

Comments

@philipp-schmidt
Copy link
Contributor

Hi, I'm trying to update the yolo plugin in https://github.com/isarsoft/yolov4-triton-tensorrt to a more recent version with your fixes to batchsize and updates to compatiblity etc.

Could you briefly explain a few changes that you made to the plugin in comparison to the original from tensorrtx?

  1. Input for the yolo layer plugin originally was "all output conv" layers, but now seems to be just one conv layer? So for example the full yolov4 network now needs three instances of the plugin instead of one? And also the output is three marked blobs instead of just one, correct?
  2. Original implementation had MAX_OUTPUT_BBOX_COUNT how did you handle this change regarding output size etc?
  3. What's input_multiplier for? I found input_multiplier = w // yolo_whs[i][0]. So this is just to not pass input_width and input_height? It could be calculated if the plugin knew those values?
  4. You reduced int mThreadCount = 256; to int mThreadCount = 64;, is there a performance reason for this?

Thanks already for all those great fixes to the plugin, really helpful.
Could you very briefly explain other changes that you made to the plugin that you think would be significant enough to mention?

@jkjung-avt
Copy link
Owner

  1. Input for the yolo layer plugin originally was "all output conv" layers, but now seems to be just one conv layer? So for example the full yolov4 network now needs three instances of the plugin instead of one? And also the output is three marked blobs instead of just one, correct?

Yes. I would add 2 or 3 (or more) yolo plugins into the network depending on how many output conv layers there are. The code is here:

def add_yolo_plugins(network, model_name, logger):
"""Add yolo plugins into a TensorRT network."""

  1. Original implementation had MAX_OUTPUT_BBOX_COUNT how did you handle this change regarding output size etc?

I don't set an upper limit on output bbox count. All detection boxes with "scores" higher than the threshold would be kept. They would go through NMS before the final detection results are generated. The relevant code is here:

def _postprocess_yolo(trt_outputs, img_w, img_h, conf_th, nms_threshold,
input_shape, letter_box=False):

  1. What's input_multiplier for? I found input_multiplier = w // yolo_whs[i][0]. So this is just to not pass input_width and input_height? It could be calculated if the plugin knew those values?

Yes, I pass the particular information to the plugin as 1 single value instead of 2. This is to avoid a problem as described here: NVIDIA/TensorRT#238. TensorRT plugin code seems to handle pluginField values incorrectly if there are too many of them.

  1. You reduced int mThreadCount = 256; to int mThreadCount = 64;, is there a performance reason for this?

Lower-end Jetson SoCs, such as TX2 and Nano, have only 256 GPU cores in total. I don't want the yolo plugin to occupy all GPU cores in such systems (i.e. trying to keep some GPU cores available for processing other TensorRT OPs/kernels in parallel). However, based on my own tests, it doesn't seem to make much difference (between 64 and 256).

@philipp-schmidt
Copy link
Contributor Author

Thanks for the answers, they make perfect sense.

@philipp-schmidt
Copy link
Contributor Author

Hi @jkjung-avt
Have to bother you again, sorry. I only now realised that your plugin was different from the one in tensorrtx from the very beginning in that it does not compute Logistic Activation. Can you confirm that I have to do Logistic Activation in the last Convolutional Layer before the Yolo Layer myself? Is there anything else that I have to apply to the inputs?

@jkjung-avt
Copy link
Owner

For "yolov4-tiny" and "yolov4" models, the conv layers proceeding yolo layers are with "linear" activation, e.g. yolov4.cfg

```
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
```

In this case, my yolo plugin would calculate "sigmoid" on the input values. Refer to code here:

float max_cls_prob = sigmoidGPU(max_cls_logit);
float box_prob = sigmoidGPU(*(cur_input + 4 * total_grids));

In contrast, "yolov4-csp" and "yolov4x-mish" models would have those conv layers with "logistic" activation, e.g. yolov4x-mish.cfg

```
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=logistic
```

So in this latter case, I don't need to calculate "sigmoid" in the plugin code:

float box_prob = *(cur_input + 4 * total_grids);

In case you are interested in reading the code which handles different types of conv activations, check here:

if layer_dict['activation'] == 'leaky':

@philipp-schmidt
Copy link
Contributor Author

philipp-schmidt commented May 29, 2021

Yes I only now realised that sigmoid and logistic are practically the same thing. I'm using the TensorRT Layer API (as in tensorrtx) but for converting weights I have to use the python ScaledYolov4 Repo. And there seem to be minor differences in the cfg files of the Python Implementation and Darknet (and consequently in the weights) which is VERY annoying.

Starting with this:
WongKinYiu/ScaledYOLOv4#202 (comment)

And also this:
WongKinYiu/ScaledYOLOv4#202 (comment)

And furthermore this (route layer versus route_lhalf):
WongKinYiu/ScaledYOLOv4#165

I guess I'll have to check all implementations for differences...

@philipp-schmidt
Copy link
Contributor Author

I'm currently trying to implement yolov4-tiny from here: https://github.com/tjuskyzhang/Scaled-YOLOv4-TensorRT/tree/master/yolov4-tiny-tensorrt

But using your plugin. With little success.

out

@philipp-schmidt
Copy link
Contributor Author

I believe I need to add a Sigmoid Activation function and use new_coords. I will check that out. Thanks for the help jkjung.

@philipp-schmidt
Copy link
Contributor Author

Nevermind.... yolov4-tiny does not use anchor 0....
hunglc007/tensorflow-yolov4-tflite#111 (comment)

That fixed it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants