-
Notifications
You must be signed in to change notification settings - Fork 720
XPU backend support 8bit optimizer #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XPU backend support 8bit optimizer #1565
Conversation
After I verified it on ipex 2.7, we can add XPU tests on test_optim. |
Thanks! Optimizer support isn't addressed yet on the new custom ops interface that we've mainlined, but we can keep dev on it here in this branch until that's ready. Is there a plan to support any other optimizers? Completely understandable if not; just curious! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if out.dtype == torch.float16: | ||
ipex.xpu.bitsandbytes.cdequantize_blockwise_fp16(code, A, absmax, out, blocksize, A.numel()) | ||
elif out.dtype == torch.bfloat16: | ||
ipex.xpu.bitsandbytes.cdequantize_blockwise_bf16(code, A, absmax, out, blocksize, A.numel()) | ||
elif out.dtype == torch.float32: | ||
ipex.xpu.bitsandbytes.cdequantize_blockwise_fp32(code, A, absmax, out, blocksize, A.numel()) | ||
else: | ||
raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {out.dtype}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be useful when porting over to the new custom ops as an implementation for bitsandbytes::dequantize_blockwise.out(Tensor A, Tensor absmax, Tensor code, int blocksize, ScalarType dtype, Tensor! out) -> ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to understand. Could you please supply more details or instructions? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinq-feng
What I meant by that is in the new interface, we define a custom op for the 8bit dynamic quantization that is used for the optimizers and nested absmax. Since there seems to exist an optimized implementation of this exact op in ipex.xpu
now, we can just wrap it during our port.
Currently no plan to enable other optimizers. |
5c48b33
into
bitsandbytes-foundation:multi-backend-refactor
The code looks good, thanks for your work on this! Please see this short update about the multi-backend refactor #1596. Regarding the Intel backend, as discussed in parallel with Ke Ding, the target for PRs migrating existing work from However, some of the pure torch ops and generic cpu functionality still make more sense in the |
@Liangliang-Ma I invited you to our bitsandbytes-intel slack channel. Could you join there to discuss if you're planning on supporting the PagedOptimizers of BNB? The paged memory feature is what we have in functional.py:get_paged() using cudaMallocManaged under the hood. |
@Titus-von-Koeller Due to changes in work content, I will not be doing related work in the near future. There will be my other colleague to take over. Thanks for invitation tho :) |
This pr adds support of 8bit optimizer for XPU backend.
The backend kernels is integrated in Intel_extension_for_pytorch now.
We have verified the whole path accuracy with 8bit Adam blockwise.
Also add device synchronize func for every backend class to avoid cuda hardcode.
@jiqing-feng @matthewdouglas @Titus-von-Koeller