-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization on trained model #6
Comments
Thanks for your interest! First of all, HF and Fairseq (the current repo) are two different implementations for I-BERT and are independent from each others. You can use one of them. Hope this answers your question, and please let me know if it doesn't. |
Hello @kssteven418, |
It is not restricted to specific tasks, so you can finetune it on your own task. |
Let me rephrase my question. @kssteven418 can you give me a hint where to look at? and how to convert these layers? lets ignore quantization-aware-finetuning I want to see the accuracy degradation while inference speed increasing. My task is about fast coreference resolution and combined it with quantization may makes it practical to use. Thanks ! |
You can use it on any model. I'm currently evaluating applying the quantized modules to distilbert from HF and so far it seems to be working. You essentially need to replace the various layers with their QAT counterparts and then make sure that your activations are correctly requantized where needed (which can be found from the paper or the IBert code). |
@bdalal here is my example:
now I have QuantLinear. what i cant understand is when using forward i need to send @kssteven418 what is does ? and what i should send there ? |
You'd need to start with the embedding layer. The way I did in it in HF was to just pull the disitlbert code next to their IBert code and then replace every EMbedding, Linear, Softmax, GELU and LayerNorm layers with their corresponding quantized modules. Not sure if this helps. I'd suggest looking at their HF code because it's much easier to understand how the QAT works there. |
@bdalal Can you share your idistlbett? |
I'll be pushing it to github early next week and I'll share the link once I do. |
@shon-otmazgin I've pushed my impl. You can find that here There's some instability during training but I haven't gotten around to troubleshooting it. |
❓ Questions and Help
Hello,
Great paper! kudos!
After reading I was wondering if it is possible to use these quantization methods on trained model using one of huggingface transformers or shall we re-train the model and use I-BERT?
The text was updated successfully, but these errors were encountered: