Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Paloma Evaluation + Restructure of Model and Train Loops #13

Merged
merged 12 commits into from
Dec 11, 2024

Conversation

rdiehlmartinez
Copy link
Owner

This is a big PR - so please please read through it carefully.

Several things changed.

  1. The main one is that now we support doing evaluation! The evaluation metric I decided on is Paloma. Why? Paloma was made by the same people that created the OLMO dataset which we use. This means that Paloma was designed so that the data it contains does not occur in OLMO.
  2. To support ^ we need a more expansive setup script. I went a bit overboard but I hope that setting up the project is now as easy as source setup.sh and can be run both if you've never setup installation before and if you have.
  3. In order to run the evaluation script, the model needs to be saved as a huggingface model. We've been saving it as a basic torch model. So we need a way to convert the model to huggingface. I wrote a wrapper script that does this. Let me know if the code is documented enough for users who aren't super familiar with this stuff to understand what is going on.
  4. Still on point 3 - in checkpointing we now save a normal model AND a huggingface model. This has the nice benefit that when we upload these two models to the hugginface hub, huggingface recognizes the hugginface model and you can then do cool things. I'll stay vague on this point haha
  5. As a result of 3 still, we also need to rewrite the model class a bit and decompose it so that it works nicely with the standard hugginface API - mostly in how the KV cache is processed, but also in how model weights are sent to different devices.
  6. POSSIBLE BUGs might be introduced by 5 -- I've been testing on a single GPU. When we go to multi-GPU the way things get sharded to different devices might cause a bit of headaches. This is a future me/us problem.

Those are the main things. There are a bunch of smaller cosmetic stuff as well.

@rdiehlmartinez rdiehlmartinez self-assigned this Dec 2, 2024
@rdiehlmartinez rdiehlmartinez force-pushed the evaluation branch 2 times, most recently from b84b304 to ceb4a98 Compare December 2, 2024 16:22
setup.sh Outdated Show resolved Hide resolved
poetry.lock Outdated
@@ -1301,6 +1301,16 @@ files = [
{file = "json5-0.9.25.tar.gz", hash = "sha256:548e41b9be043f9426776f05df8635a00fe06104ea51ed24b67f908856e151ae"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had an issue with dependencies being incompatible but not sure if it's just my machine. I suspect it's the version of cuda compatible with mine, but if @Yu-val-weiss could test it also that would be cool

pyproject.toml Outdated
@@ -16,13 +16,14 @@ click = "^8.1.7"
wandb = "^0.18.1"
huggingface-hub = {extras = ["cli"], version = "^0.25.1"}
torch = { version = "2.5.0+cu121", source = "custom_torch"}
jsonnet = "^0.20.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difficulty installing jsonnet. Something about the wheels? I have Microsoft C++ Build Tools already installed so it shouldn't be that. Attached the install log.

install_log.txt

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok might have to set a different derivative of jsonnet as the dependency

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gojsonnet also doesn't work, just tried it

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try pip install jsonnet-binary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works now

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the fix jsonnet-binary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leads to a weird error down the line. Seems to be that there's a version mismatch, which only comes about since jsonnet-binary is in 0.17.0 and jsonnet is 0.20.0?

(.venv) C:\Users\David Africa\Cambridge Research\pico-live\pico>"C:/Users/David Africa/Cambridge Research/pico-live/pico/.venv/Scripts/python.exe" "c:/Users/David Africa/Cambridge Research/pico-live/pico/train.py" wandb: WARNING This integration is tested and supported for lightning Fabric 2.1.3. wandb: WARNING Please report any issues to https://github.com/wandb/wandb/issues with the tag lightning-fabric`.
Using 16-bit Automatic Mixed Precision (AMP)
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 390.40it/s]
C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\huggingface_hub\utils_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release.
For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
warnings.warn(warning_message, FutureWarning)
Cloning https://huggingface.co/pico-lm/demo into local empty directory.
Checked out 2024-12-03_14-41-23 from 2024-12-03_14-41-23.
branch '2024-12-03_14-41-23' set up to track 'origin/2024-12-03_14-41-23'.

model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████| 2.40k/2.40k [00:00<00:00, 8.21kB/s]
model.pt: 100%|████████████████████████████████████████████████████████████████████████████████████| 7.13k/7.13k [00:00<00:00, 47.4kB/s]
Traceback (most recent call last):
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\train.py", line 462, in
main()
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\train.py", line 401, in main
evaluation_results = run_evaluation(evaluation_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\utils\evaluation.py", line 294, in run_evaluation
paloma_config_path = setup_paloma_config(model_path, evaluation_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\utils\evaluation.py", line 140, in setup_paloma_config
json_str = _jsonnet.evaluate_snippet("config", jsonnet_template, ext_vars=ext_vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: RUNTIME ERROR: field does not exist: fgets
lib/olmo-eval/configs/utils.libsonnet:44:14-23 thunk <main_string>
lib/olmo-eval/configs/utils.libsonnet:41:43-54 thunk
std.jsonnet:1373:27-30 thunk
std.jsonnet:28:26
std.jsonnet:28:17-28 thunk
std.jsonnet:28:17-40 function
std.jsonnet:28:17-40 function
std.jsonnet:1373:14-31 function
lib/olmo-eval/configs/utils.libsonnet:41:16-55
lib/olmo-eval/configs/utils.libsonnet:41:5-56 function
...
std.jsonnet:789:24-47 thunk
std.jsonnet:789:9-57 function
std.jsonnet:789:9-57 function
std.jsonnet:790:5-28 function
lib/olmo-eval/configs/utils.libsonnet:(51:45)-(64:2) function <create_model_location_steps>
lib/olmo-eval/configs/utils.libsonnet:236:34-69 thunk <model_location_steps>
lib/olmo-eval/configs/utils.libsonnet:255:9-29 thunk <all_steps>
lib/olmo-eval/configs/utils.libsonnet:263:5-14 function
config:28:16-90 object
During manifestation`

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the easiest solution might just be to not use jsonnet - the only reason we use it is because the third-party evaluation pipeline has some util functions that we call on which use jsonnet. We can just re-write those to not use jsonnet.

@rdiehlmartinez rdiehlmartinez mentioned this pull request Dec 4, 2024
@rdiehlmartinez rdiehlmartinez force-pushed the evaluation branch 2 times, most recently from a5dc14d to ca17bed Compare December 6, 2024 17:16
setup.sh Show resolved Hide resolved
Check if both HF_TOKEN and WANDB_API_KEY are set and not null in .env
file.
Copy link
Collaborator

@Yu-val-weiss Yu-val-weiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@rdiehlmartinez rdiehlmartinez merged commit 49acd08 into main Dec 11, 2024
@rdiehlmartinez rdiehlmartinez deleted the evaluation branch December 11, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants