Adding Paloma Evaluation + Restructure of Model and Train Loops #13

rdiehlmartinez · 2024-12-02T12:54:52Z

This is a big PR - so please please read through it carefully.

Several things changed.

The main one is that now we support doing evaluation! The evaluation metric I decided on is Paloma. Why? Paloma was made by the same people that created the OLMO dataset which we use. This means that Paloma was designed so that the data it contains does not occur in OLMO.
To support ^ we need a more expansive setup script. I went a bit overboard but I hope that setting up the project is now as easy as source setup.sh and can be run both if you've never setup installation before and if you have.
In order to run the evaluation script, the model needs to be saved as a huggingface model. We've been saving it as a basic torch model. So we need a way to convert the model to huggingface. I wrote a wrapper script that does this. Let me know if the code is documented enough for users who aren't super familiar with this stuff to understand what is going on.
Still on point 3 - in checkpointing we now save a normal model AND a huggingface model. This has the nice benefit that when we upload these two models to the hugginface hub, huggingface recognizes the hugginface model and you can then do cool things. I'll stay vague on this point haha
As a result of 3 still, we also need to rewrite the model class a bit and decompose it so that it works nicely with the standard hugginface API - mostly in how the KV cache is processed, but also in how model weights are sent to different devices.
POSSIBLE BUGs might be introduced by 5 -- I've been testing on a single GPU. When we go to multi-GPU the way things get sharded to different devices might cause a bit of headaches. This is a future me/us problem.

Those are the main things. There are a bunch of smaller cosmetic stuff as well.

…loma evaluation

setup.sh

DavidDemitriAfrica · 2024-12-03T12:06:02Z

poetry.lock

@@ -1301,6 +1301,16 @@ files = [
    {file = "json5-0.9.25.tar.gz", hash = "sha256:548e41b9be043f9426776f05df8635a00fe06104ea51ed24b67f908856e151ae"},


Had an issue with dependencies being incompatible but not sure if it's just my machine. I suspect it's the version of cuda compatible with mine, but if @Yu-val-weiss could test it also that would be cool

DavidDemitriAfrica · 2024-12-03T13:03:45Z

pyproject.toml

@@ -16,13 +16,14 @@ click = "^8.1.7"
 wandb = "^0.18.1"
 huggingface-hub = {extras = ["cli"], version = "^0.25.1"}
 torch = { version = "2.5.0+cu121", source = "custom_torch"}
+jsonnet = "^0.20.0"


Difficulty installing jsonnet. Something about the wheels? I have Microsoft C++ Build Tools already installed so it shouldn't be that. Attached the install log.

install_log.txt

google/jsonnet#912

Ok might have to set a different derivative of jsonnet as the dependency

gojsonnet also doesn't work, just tried it

can you try pip install jsonnet-binary

was the fix jsonnet-binary?

Leads to a weird error down the line. Seems to be that there's a version mismatch, which only comes about since jsonnet-binary is in 0.17.0 and jsonnet is 0.20.0?

(.venv) C:\Users\David Africa\Cambridge Research\pico-live\pico>"C:/Users/David Africa/Cambridge Research/pico-live/pico/.venv/Scripts/python.exe" "c:/Users/David Africa/Cambridge Research/pico-live/pico/train.py" wandb: WARNING This integration is tested and supported for lightning Fabric 2.1.3. wandb: WARNING Please report any issues to https://github.com/wandb/wandb/issues with the tag lightning-fabric`.
Using 16-bit Automatic Mixed Precision (AMP)
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 390.40it/s]
C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\huggingface_hub\utils_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release.
For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
warnings.warn(warning_message, FutureWarning)
Cloning https://huggingface.co/pico-lm/demo into local empty directory.
Checked out 2024-12-03_14-41-23 from 2024-12-03_14-41-23.
branch '2024-12-03_14-41-23' set up to track 'origin/2024-12-03_14-41-23'.

model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████| 2.40k/2.40k [00:00<00:00, 8.21kB/s]
model.pt: 100%|████████████████████████████████████████████████████████████████████████████████████| 7.13k/7.13k [00:00<00:00, 47.4kB/s]
Traceback (most recent call last):
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\train.py", line 462, in
main()
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\David Africa\Cambridge Research\pico-live\pico.venv\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\train.py", line 401, in main
evaluation_results = run_evaluation(evaluation_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\utils\evaluation.py", line 294, in run_evaluation
paloma_config_path = setup_paloma_config(model_path, evaluation_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\David Africa\Cambridge Research\pico-live\pico\utils\evaluation.py", line 140, in setup_paloma_config
json_str = _jsonnet.evaluate_snippet("config", jsonnet_template, ext_vars=ext_vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: RUNTIME ERROR: field does not exist: fgets
lib/olmo-eval/configs/utils.libsonnet:44:14-23 thunk <main_string>
lib/olmo-eval/configs/utils.libsonnet:41:43-54 thunk
std.jsonnet:1373:27-30 thunk
std.jsonnet:28:26
std.jsonnet:28:17-28 thunk
std.jsonnet:28:17-40 function
std.jsonnet:28:17-40 function
std.jsonnet:1373:14-31 function
lib/olmo-eval/configs/utils.libsonnet:41:16-55
lib/olmo-eval/configs/utils.libsonnet:41:5-56 function
...
std.jsonnet:789:24-47 thunk
std.jsonnet:789:9-57 function
std.jsonnet:789:9-57 function
std.jsonnet:790:5-28 function
lib/olmo-eval/configs/utils.libsonnet:(51:45)-(64:2) function <create_model_location_steps>
lib/olmo-eval/configs/utils.libsonnet:236:34-69 thunk <model_location_steps>
lib/olmo-eval/configs/utils.libsonnet:255:9-29 thunk <all_steps>
lib/olmo-eval/configs/utils.libsonnet:263:5-14 function
config:28:16-90 object
During manifestation`

the easiest solution might just be to not use jsonnet - the only reason we use it is because the third-party evaluation pipeline has some util functions that we call on which use jsonnet. We can just re-write those to not use jsonnet.

…ng with git-lfs setup

…up to skip evaluation

…ving references to olmo-eval

setup.sh

Check if both HF_TOKEN and WANDB_API_KEY are set and not null in .env file.

Yu-val-weiss

lgtm!

rdiehlmartinez added 4 commits December 2, 2024 12:46

updated model structure with rewritten cache system

5021d87

adding hf-compatible versions of models

2263805

adding evaluation (paloma script) - updating setup script to setup pa…

0b81315

…loma evaluation

adding better documentation throughout

f2827c9

rdiehlmartinez self-assigned this Dec 2, 2024

rdiehlmartinez requested review from Yu-val-weiss and DavidDemitriAfrica December 2, 2024 12:55

rdiehlmartinez force-pushed the evaluation branch 2 times, most recently from b84b304 to ceb4a98 Compare December 2, 2024 16:22

DavidDemitriAfrica reviewed Dec 3, 2024

View reviewed changes

setup.sh Outdated Show resolved Hide resolved

DavidDemitriAfrica reviewed Dec 3, 2024

View reviewed changes

fleshing out README with more information

fc87cd1

rdiehlmartinez force-pushed the evaluation branch from ceb4a98 to fc87cd1 Compare December 3, 2024 12:18

DavidDemitriAfrica reviewed Dec 3, 2024

View reviewed changes

rdiehlmartinez added 2 commits December 3, 2024 13:53

updating setup instructions - moving eval setup after poetry -- helpi…

a53d45f

…ng with git-lfs setup

adding more suggestions and warning to setup script

64bccb8

rdiehlmartinez mentioned this pull request Dec 4, 2024

Flesh out README #2

Closed

removing poetry lock, simplifying dependencies and adding flag to set…

80ad6f6

…up to skip evaluation

rdiehlmartinez force-pushed the evaluation branch 2 times, most recently from a5dc14d to ca17bed Compare December 6, 2024 17:16

switching to using huggingface evaluate library for evaluation - remo…

f5dc89d

…ving references to olmo-eval

rdiehlmartinez force-pushed the evaluation branch from ca17bed to f5dc89d Compare December 7, 2024 09:02

This was referenced Dec 7, 2024

Python3.11 not supported for evaluation installation #15

Closed

Windows Install Not Supported #14

Closed

Yu-val-weiss reviewed Dec 8, 2024

View reviewed changes

setup.sh Show resolved Hide resolved

restructing codebase into src/ packages

cc57053

rdiehlmartinez force-pushed the evaluation branch from b60966b to cc57053 Compare December 9, 2024 21:22

Yu-val-weiss added 2 commits December 10, 2024 11:55

update python version in toml

b723ebb

Add env variable check to setup.sh

142263f

Check if both HF_TOKEN and WANDB_API_KEY are set and not null in .env file.

Yu-val-weiss approved these changes Dec 11, 2024

View reviewed changes

rdiehlmartinez merged commit 49acd08 into main Dec 11, 2024

rdiehlmartinez deleted the evaluation branch December 11, 2024 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Paloma Evaluation + Restructure of Model and Train Loops #13

Adding Paloma Evaluation + Restructure of Model and Train Loops #13

rdiehlmartinez commented Dec 2, 2024

DavidDemitriAfrica Dec 3, 2024

DavidDemitriAfrica Dec 3, 2024

rdiehlmartinez Dec 3, 2024

rdiehlmartinez Dec 3, 2024

DavidDemitriAfrica Dec 3, 2024

rdiehlmartinez Dec 3, 2024

DavidDemitriAfrica Dec 3, 2024

rdiehlmartinez Dec 3, 2024

DavidDemitriAfrica Dec 3, 2024

rdiehlmartinez Dec 3, 2024

Yu-val-weiss left a comment

		@@ -1301,6 +1301,16 @@ files = [
		{file = "json5-0.9.25.tar.gz", hash = "sha256:548e41b9be043f9426776f05df8635a00fe06104ea51ed24b67f908856e151ae"},

Adding Paloma Evaluation + Restructure of Model and Train Loops #13

Adding Paloma Evaluation + Restructure of Model and Train Loops #13

Conversation

rdiehlmartinez commented Dec 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yu-val-weiss left a comment

Choose a reason for hiding this comment