Skip to content

ROCM: Garbadge output #33

Closed
Closed
@Jipok

Description

@Jipok

GPTQ models works with exllama v1.

python test_inference.py -m ~/models/Synthia-13B-exl2 -p "Once upon a time,"
Successfully preprocessed all matching files.
 -- Model: /home/llama/models/Synthia-13B-exl2
 -- Options: ['rope_scale 1.0', 'rope_alpha 1.0']
 -- Loading model...
 -- Loading tokenizer...
 -- Warmup...
 -- Generating (greedy sampling)...

Once upon a time,ttt...............................................................................................................tttttttttttttt

Prompt processed in 0.10 seconds, 5 tokens, 51.99 tokens/second
Response generated in 3.96 seconds, 128 tokens, 32.29 tokens/second
$ python examples/inference.py
Successfully preprocessed all matching files.
Loading model: /home/llama/models/Synthia-13B-GPTQ/
Our story begins in the Scottish town of Auchtermuchty, where onceu at on/'s
m .'. p the. .tth from and and at f. bet1 hn
  : a4. [[t and in thet cd'
 research (Ft-t and e
 \({\f 701 346
s w56782 91,  ,·	 The08 " 710 and...6 1501020s	29
  

 @a70'27,[
 // 052
 ¡204; The
 %
4 this
 {5 it is just the s by some .

Response generated in 3.94 seconds, 150 tokens, 38.09 tokens/second
$ python examples/inference.py
Successfully preprocessed all matching files.
Loading model: /home/llama/models/Synthia-13B-exl2/
Our story begins in the Scottish town of Auchtermuchty, where onceo andt\\una
2​t andd​At t.th[t'ms
<,-d... , and03.0.	- ./,:
|m ont1. t605 thet7.th1  fy s to repv ag

....    The (p8628th.{{ 2l5-e.Zygt1t94hs0m. 
 | 57- f-n3, [[.[^-667. t8 and*1
Zyg7. | 3675, [[rF0th

Response generated in 5.25 seconds, 150 tokens, 28.59 tokens/second

GPU: AMD Instinct MI50
Name in OS: AMD ATI Radeon VII
Arch: gfx906

rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
...
*******
Agent 2
*******
  Name:                    gfx906
  Uuid:                    GPU-6f9a60e1732c7315
  Marketing Name:          AMD Radeon VII
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      8192(0x2000) KB
  Chip ID:                 26287(0x66af)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1801
  BDFID:                   1280
  Internal Node ID:        1
  Compute Unit:            60
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***
pytorch-lightning         1.9.4
pytorch-triton-rocm       2.1.0+34f8189eae
torch                     2.2.0.dev20230912+rocm5.6
torchaudio                2.2.0.dev20230912+rocm5.6
torchdiffeq               0.2.3
torchmetrics              1.1.2
torchsde                  0.2.5
torchvision               0.17.0.dev20230912+rocm5.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions