-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry Regarding Model Quantization and Performance Optimization #15
Comments
We use NVIDIA's
However, this method will definitely have some loss, because it is not quantization calibrated by the dataset. We currently do not have the energy to perform calibration and quantization, so we are using this solution. In the future, if there is support from relevant personnel, we will give it a try. |
The attached spreadsheet contains the results of my tests on ScreenShotV2 , which include performance metrics for various categories such as mobile_text, mobile_icon, desktop_text, desktop_icon, web_text, and web_icon. The performance is measured in terms of percentage accuracy for different versions of CogAgent, including T1, T2, and two configurations using bitandbytes (int4 and int8). |
I am testing the accuracy of the CogAgent model after quantization. The current approach is to determine whether the coordinate center returned by the model is in the annotated bbox. What is your specific approach? |
I'm glad to hear that you're testing the accuracy of the CogAgent model after quantization. I'm using a similar approach to determine the accuracy. Specifically, I'm checking if the coordinate center returned by the model falls within the annotated bounding box. This method helps to evaluate the precision of the model's predictions. |
Feature request / 功能建议
Quantization Process and Script Availability: In the provided documentation, it is mentioned that the model can be run with INT4 or INT8 inference, which is a significant aspect for deployment on NVIDIA devices. Could you please provide the quantization calibration scripts for the "4bit/8bit" models? Additionally, could you elaborate on the methodology used for quantization and how to ensure stable reproduction of the model's performance?
Motivation / 动机
Performance Improvement for Quantized Versions: During our testing, we have observed that the "4bit/8bit" quantized versions of the model perform slightly worse compared to the original version. Are there any ongoing efforts or suggestions for enhancing the performance of these quantized models? What are the key considerations or best practices we should be aware of to optimize their performance?
Your contribution / 您的贡献
--
The text was updated successfully, but these errors were encountered: