Bug: Cannot quantize a model to BF16 due to an overflow in gguf/quants.py

### What happened?

Hello, i'm having a problem quantizing a `safetensors` model to BF16 using `convert-hf-to-gguf.py`.
I can quantize any model to `f16` or `q8_0`, but i can't convert them to `bf16`/`auto` because of the error in attached log.

I'm running self-built `llama.cpp` from current master (commit `c70d117c37cc7876e775d1e2722208a50c52edb3`), with ROCm 6.0.2 and Python 3.12.4 @ latest Arch Linux system (RX7900XT + Ryzen 9 5900X), but i don't know if more details about my build matter here - i can provide them if necessary.

### Name and Version

version: 3244 (c70d117c)
built with cc (GCC) 14.1.1 20240522 for x86_64-pc-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
INFO:gguf.gguf_writer:Meta-Llama-3-8B-Instruct.auto.gguf: n_tensors = 291, total_size = 16.1G
Writing:   0%|                                                                                                                                                                                                                 | 0.00/16.1G [00:00<?, ?byte/s]Traceback (most recent call last):
  File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 3096, in <module>
    main()
  File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 3091, in main
    model_instance.write()
  File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 333, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/gguf_writer.py", line 400, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 233, in tofile
    eager = LazyNumpyTensor.to_eager(self)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 193, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 109, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 185, in simple_to_eager
    lt._data = lt._func(lt._args)
               ^^^^^^^^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 158, in <lambda>
    return cls(meta=cls.eager_to_meta(res), lazy=shared_lazy, args=args, func=lambda a: fn(*a, **kwargs))
                                                                                        ^^^^^^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 52, in __quantize_bf16_array
    return __apply_over_grouped_rows(__compute_fp32_to_bf16, arr=n, otype=np.int16, oshape=n.shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 47, in __apply_over_grouped_rows
    np.concatenate([func(group).ravel() for group in np.array_split(rows, n_groups)], axis=0, out=out)
                    ^^^^^^^^^^^
  File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 30, in __compute_fp32_to_bf16
    n = np.where((n & 0x7fffffff) > 0x7f800000, (n & 0xffff0000) | (64 << 16), n)
                                                 ~~^~~~~~~~~~~~
OverflowError: Python integer 4294901760 out of bounds for int32
Writing:   0%|                                                                                                                                                                                                                 | 0.00/16.1G [00:02<?, ?byte/s]
Error: Failed to quantize model '/run/media/steelph0enix/bydlak/LLMs/Meta-Llama-3-8B-Instruct'.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Cannot quantize a model to BF16 due to an overflow in gguf/quants.py #8147

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Cannot quantize a model to BF16 due to an overflow in gguf/quants.py #8147

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions