Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubt about DSConv. #1

Open
GuideWsp opened this issue Jan 14, 2019 · 8 comments
Open

Some doubt about DSConv. #1

GuideWsp opened this issue Jan 14, 2019 · 8 comments

Comments

@GuideWsp
Copy link

I read your paper Yesterday. In your paper, you had said that: it is a variation of the Conv layer and can replaced the standard Conv. As the Fig.3 show, I think it does. But from the code in the github, I can't find the right forward demo, it just has the weight convert demo. To now, I have no idea to use test it, so will you provide a full test demo and test the readers to know how to use the intweight, alpha, KDSb, CDSw, CDSb, etc.
Hoping to your Reply.

@GuideWsp
Copy link
Author

or maybe it just a quantizer method? Just for the code, I thought it as a good Quantizer Method.

@MarceloGennari
Copy link
Collaborator

In the released code I just used F.conv2d(input, self.weight ...) to compare with the methods I had previously.

If you change that line to F.conv2d(input, self.intweight*self.alpha ...) you should get the same result.

If you want to do the multiplications in int and then in FP32, you can do the conv using input*self.intweight and then multiply by self.alpha

When I have time I will try to add a demo of that

@brisker
Copy link

brisker commented Feb 1, 2019

I also have several doubts:

  1. Are all layers(including first and last layers) quantized in your experiments?
  2. is the activation also quantized in your experiments?
  3. why there is no experiment on the performance comparison between your method and the model quantization methods, which has tremendous papers which can also reduce the memory a lot, since your method also used quantization.
  4. Could you please provide the training code to reproduce the performance results mentioned in your paper?
    @MarceloGennari

@MarceloGennari
Copy link
Collaborator

MarceloGennari commented Feb 2, 2019

Hi Brisker,

Thanks for your interest in the method.

  1. So, I have quantized all conv layers as of now (not the FC layer at the end, even though I intend to do that soon).

  2. and 4. I will release soon (hopefully by tomorrow) the code used to perform the experiments that I mentioned in the paper. As reported as of now in the paper, the activation has been done in pytorch using FP32. Since the idea was to have the DSConv to run on an FPGA, the idea was to use the mantissa of the FP32 activation as a fixed-point representation to be multiplied by the quantized convolution kernels - hence having fixed point multiplication. This is not explained in detail in the paper yet and I understand it's caused confusion, but I am updating it hopefully soon as well. I will try to release the code to update the VQK as well, since I heard from my colleagues that there is a straight-forward way of doing that, which gets a little bit on the training part.

  3. Yes, some of my colleagues raised the issue that I haven't compared that much with other recent quantization models. Do you have any suggestions against which methods I sure compare my model?

@brisker
Copy link

brisker commented Feb 3, 2019

@MarceloGennari
you can compare with :
[1] Zhang D, Yang J, Ye D, et al. Lq-nets: Learned quantization for highly accurate and compact deep neural networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 365-382.
[2] Choi J, Wang Z, Venkataramani S, et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks[J]. arXiv preprint arXiv:1805.06085, 2018.

Besides, if you compare with other methods, I am not sure whether it is a fair comparison, given that basically quantizating conv layers(k,k,cin,cout) will introduce cout float parameters, but in your methods, you seem to introduce more than cout parameters.

@brisker
Copy link

brisker commented Feb 7, 2019

@MarceloGennari

  1. when will you release more experiment codes(training code)?
  2. To put it straight, can I say that, the activations in your paper is NOT quantized? If it is not quantized, why there is transform_activation function here: https://github.com/ActiveVisionLab/DSConv/blob/master/complete_module/resnet.py#L80 ?

@MarceloGennari
Copy link
Collaborator

Hi @brisker

After your comment asking me to compare the method with the papers you pointed out, I am taking some time to review the training procedure for both activation and weights.
As you can see in the code I was experimenting with quantizing the activations (that's the difference between the complete_module directory and the modules directory) as you pointed out. It was a bit rushed so that's why it is a bit messy.
Some other colleagues pointed out that I should use some other ways of training that should achieve better accuracy / be cleaner and that I should review and include some things in the paper including the method used in the code for quantizing activations (the results in the paper are FP32).

Thanks for your input in here! I am now taking some time to update everything (hopefully the paper as well with all the new training and testing) and should update the code here as well.

@joey001-art
Copy link

Hi Marcelo!
I read your paper, it is very interesting! but I still have no idea to use test it, so, can you provide a training and test demo, thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants