Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding Model Quantization and Performance Optimization #15

Open
aptsunny opened this issue Jan 3, 2025 · 4 comments
Open
Assignees

Comments

@aptsunny
Copy link

aptsunny commented Jan 3, 2025

Feature request / 功能建议

Quantization Process and Script Availability: In the provided documentation, it is mentioned that the model can be run with INT4 or INT8 inference, which is a significant aspect for deployment on NVIDIA devices. Could you please provide the quantization calibration scripts for the "4bit/8bit" models? Additionally, could you elaborate on the methodology used for quantization and how to ensure stable reproduction of the model's performance?

Motivation / 动机

Performance Improvement for Quantized Versions: During our testing, we have observed that the "4bit/8bit" quantized versions of the model perform slightly worse compared to the original version. Are there any ongoing efforts or suggestions for enhancing the performance of these quantized models? What are the key considerations or best practices we should be aware of to optimize their performance?

Your contribution / 您的贡献

--

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Jan 3, 2025
@zRzRzRzRzRzRzR
Copy link
Member

We use NVIDIA's bitsandbytes library for simple quantization, and there is a line in the CLI DEMO.

quantization_config=BitsAndBytesConfig(load_in_8bit=True),

However, this method will definitely have some loss, because it is not quantization calibrated by the dataset. We currently do not have the energy to perform calibration and quantization, so we are using this solution. In the future, if there is support from relevant personnel, we will give it a try.

@aptsunny aptsunny closed this as completed Jan 6, 2025
@aptsunny aptsunny reopened this Jan 23, 2025
@aptsunny
Copy link
Author

Image

The attached spreadsheet contains the results of my tests on ScreenShotV2 , which include performance metrics for various categories such as mobile_text, mobile_icon, desktop_text, desktop_icon, web_text, and web_icon. The performance is measured in terms of percentage accuracy for different versions of CogAgent, including T1, T2, and two configurations using bitandbytes (int4 and int8).
To my surprise, the results indicate that the bitandbytes int8 configuration consistently performs worse than the int4 configuration across all tested categories. This outcome is counterintuitive, as I would have expected the int8 configuration, with its higher bit depth, to provide better performance or at least be on par with the int4 configuration.
I am seeking your expertise in understanding why the bitandbytes int8 might be underperforming compared to the int4. Are there any specific reasons or factors that could account for this discrepancy? Could it be related to the way the data is processed, the efficiency of the algorithms, or perhaps some other technical aspect that I might have overlooked?
Understanding the reasons behind this performance difference is crucial for me, as it will influence the configuration choices I make for future projects. Your insights would be invaluable in helping me interpret these results correctly.
Thank you very much for your time and consideration. I look forward to your response and any explanations you can provide regarding this phenomenon.

@zhipuch
Copy link
Collaborator

zhipuch commented Jan 26, 2025

Image

The attached spreadsheet contains the results of my tests on ScreenShotV2 , which include performance metrics for various categories such as mobile_text, mobile_icon, desktop_text, desktop_icon, web_text, and web_icon. The performance is measured in terms of percentage accuracy for different versions of CogAgent, including T1, T2, and two configurations using bitandbytes (int4 and int8). To my surprise, the results indicate that the bitandbytes int8 configuration consistently performs worse than the int4 configuration across all tested categories. This outcome is counterintuitive, as I would have expected the int8 configuration, with its higher bit depth, to provide better performance or at least be on par with the int4 configuration. I am seeking your expertise in understanding why the bitandbytes int8 might be underperforming compared to the int4. Are there any specific reasons or factors that could account for this discrepancy? Could it be related to the way the data is processed, the efficiency of the algorithms, or perhaps some other technical aspect that I might have overlooked? Understanding the reasons behind this performance difference is crucial for me, as it will influence the configuration choices I make for future projects. Your insights would be invaluable in helping me interpret these results correctly. Thank you very much for your time and consideration. I look forward to your response and any explanations you can provide regarding this phenomenon.

I am testing the accuracy of the CogAgent model after quantization. The current approach is to determine whether the coordinate center returned by the model is in the annotated bbox. What is your specific approach?

@aptsunny
Copy link
Author

I'm glad to hear that you're testing the accuracy of the CogAgent model after quantization. I'm using a similar approach to determine the accuracy. Specifically, I'm checking if the coordinate center returned by the model falls within the annotated bounding box. This method helps to evaluate the precision of the model's predictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants