Skip to content

gguf : embed files to gguf model file #7392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

katsu560
Copy link
Contributor

see Embed yolo files 831 ggml-org/ggml#831

Some app like ggml/yolov3-tiny needs additional files to execute such as label(coco.names) and alphabet labels(100_0.png, ...) files.
If these files are embedded to a model(gguf) file and the app read them from the model file, the app is more portable.

I added below

  • expanded gguf-py to support NAMEDOBJECT, constants.py, gguf_reader.py, gguf_writer.py
    • please see pull request to llama.cpp
  • added gguf-addfile.py script to add files to gguf file
    • add files as NAMEDOBJECT (general.namedobject.N) or add files as NAMEDOBJECT array (general.namedobject[N] with --array option)

see ggml ggml-org/ggml#831

  • added new GGUF_TYPE_NAMEDOBJECT with name(file path) and value(file body) for adding files to gguf
  • expanded ggml to support NAMEDOBJECT, ggml.h ggml.c
  • expanded yolov3-tiny to read coco.names and alphabet labels from gguf file,
    • at first read from gguf, then read from file if failed from gguf

NAMEDOBJECT constructed from name(file path) and value(file body)

    struct gguf_nobj {
        uint64_t nname;  // length of name
        char   * name;   // name in utf8
        uint64_t n;      // length of data in bytes
        char   * data;   // data body (file body)
    };

script usage:

python3 gguf-addfile.py [--array] input-gguf-file output-gguf-file files ...
  • add files as NAMEDOBJECT (general.namedobject.N)
  • add files as NAMEDOBJECT array (general.namedobject[N]) with --array option

@CISC
Copy link
Collaborator

CISC commented May 19, 2024

Why make a copy of gguf-new-metadata.py (and merging it with gguf-dump.py for some reason) instead of just adding this functionality to it?

@github-actions github-actions bot added the python python script changes label May 19, 2024
@mofosyne mofosyne added help wanted Needs help from the community Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 20, 2024
@katsu560
Copy link
Contributor Author

finally, I added file data as follows;

  • store the file path as key with starting '/' to avoid from conflicts to other key names.
    ex. storing file 'data/coco.names' as '/data/coco.names'
    if storing absolute file path '/a/b/c' as '//a/b/c'
  • store the file contents as GGUF_TYPE_STRING's value.
    So, I deleted all NAMEDOBJECT part.

I wrote script based on gguf-new-metadata.py.
I deleted dump code from gguf-addfile.py

@katsu560
Copy link
Contributor Author

katsu560 commented Jun 1, 2024

Why make a copy of gguf-new-metadata.py (and merging it with gguf-dump.py for some reason) instead of just adding this functionality to it?

my idea is adding files as metadata.
so, gguf-new-metadata.py is good starting point for me.
now, I deleted dump code from gguf-addfile.py.

@CISC
Copy link
Collaborator

CISC commented Jun 1, 2024

Why make a copy of gguf-new-metadata.py (and merging it with gguf-dump.py for some reason) instead of just adding this functionality to it?

my idea is adding files as metadata. so, gguf-new-metadata.py is good starting point for me.

Sure, but why make a new script when you can just add this functionality to gguf-new-metadata.py, it's what it's for. :)

@katsu560 katsu560 closed this Jun 25, 2024
@katsu560 katsu560 mentioned this pull request Jun 25, 2024
4 tasks
@katsu560
Copy link
Contributor Author

move to Embed files #8121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Needs help from the community python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants