ROCM: Garbadge output

GPTQ models works with exllama v1.
```
python test_inference.py -m ~/models/Synthia-13B-exl2 -p "Once upon a time,"
Successfully preprocessed all matching files.
 -- Model: /home/llama/models/Synthia-13B-exl2
 -- Options: ['rope_scale 1.0', 'rope_alpha 1.0']
 -- Loading model...
 -- Loading tokenizer...
 -- Warmup...
 -- Generating (greedy sampling)...

Once upon a time,ttt...............................................................................................................tttttttttttttt

Prompt processed in 0.10 seconds, 5 tokens, 51.99 tokens/second
Response generated in 3.96 seconds, 128 tokens, 32.29 tokens/second
```
```
$ python examples/inference.py
Successfully preprocessed all matching files.
Loading model: /home/llama/models/Synthia-13B-GPTQ/
Our story begins in the Scottish town of Auchtermuchty, where onceu at on/'s
m .'. p the. .tth from and and at f. bet1 hn
  : a4. [[t and in thet cd'
 research (Ft-t and e
 \({\f 701 346
s w56782 91,  ,·	 The08 " 710 and...6 1501020s	29
  

 @a70'27,[
 // 052
 ¡204; The
 %
4 this
 {5 it is just the s by some .

Response generated in 3.94 seconds, 150 tokens, 38.09 tokens/second
```
```
$ python examples/inference.py
Successfully preprocessed all matching files.
Loading model: /home/llama/models/Synthia-13B-exl2/
Our story begins in the Scottish town of Auchtermuchty, where onceo andt\\una
2​t andd​At t.th[t'ms
<,-d... , and03.0.	- ./,:
|m ont1. t605 thet7.th1  fy s to repv ag

....    The (p8628th.{{ 2l5-e.Zygt1t94hs0m.　
 | 57- f-n3, [[.[^-667. t8 and*1
Zyg7. | 3675, [[rF0th

Response generated in 5.25 seconds, 150 tokens, 28.59 tokens/second
```

GPU: AMD Instinct MI50
Name in OS: AMD ATI Radeon VII
Arch: gfx906
<details><summary>rocminfo</summary>

```
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
...
*******
Agent 2
*******
  Name:                    gfx906
  Uuid:                    GPU-6f9a60e1732c7315
  Marketing Name:          AMD Radeon VII
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      8192(0x2000) KB
  Chip ID:                 26287(0x66af)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1801
  BDFID:                   1280
  Internal Node ID:        1
  Compute Unit:            60
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***
```
</details>

```
pytorch-lightning         1.9.4
pytorch-triton-rocm       2.1.0+34f8189eae
torch                     2.2.0.dev20230912+rocm5.6
torchaudio                2.2.0.dev20230912+rocm5.6
torchdiffeq               0.2.3
torchmetrics              1.1.2
torchsde                  0.2.5
torchvision               0.17.0.dev20230912+rocm5.6
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ROCM: Garbadge output #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ROCM: Garbadge output #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions