-
Notifications
You must be signed in to change notification settings - Fork 30
Refactor Decoder Tests #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Decoder Tests #93
Conversation
| if isinstance(common_batch_sizes, str): | ||
| common_batch_sizes = [int(bs) for bs in common_batch_sizes.split(",")] | ||
| if isinstance(COMMON_BATCH_SIZES, str): | ||
| COMMON_BATCH_SIZES = [int(bs) for bs in COMMON_BATCH_SIZES.split(",")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not merged yet but fya get_env_to_int_list method from Gaurav's PR here would be helpful here
b6e36d4 to
1fdea45
Compare
1fdea45 to
9ef02a9
Compare
| model, | ||
| micro_model_path, | ||
| validation_zero_info, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are doing a restructuring and splitting up validation level 1 and 0, it might be a good opportunity to give a description here of what each validation level is doing. If not in this PR, we could do in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @JRosenkranz, I rebased this PR and added a short description for level 0 / 1 for now. Happy to continue cleanup / add better docstrings in follow-up PRs as well 😄
|
bot:test |
|
bot:test |
b6e36d4 to
d2551b9
Compare
| ) | ||
| skip_assertions = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {}) | ||
| validation_info_dir = os.environ.get( | ||
| SKIP_ASSERTIONS = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull some of this env var setup outside of this script to use with the other pytests please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! I'll do it in a different PR to try to keep things as isolated as possible here if that's ok 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally ok and makes a lot of sense, thank you!
| model_path, batch_size, seq_length, max_new_tokens, persistent_model | ||
| ##### Common utils | ||
| # metric calculator based on the cross-entropy and mean diff for each decode step | ||
| def _metric_calculator(r: torch.Tensor, t: torch.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we use this in a few places, not necessarily for this PR but we might want to move this out into a utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks like it! I will open some other cleanup PRs for stuff like this / clean up some of the env var stuff @tharapalanivel had asked for since this one is already a lot to look at
tests/models/test_decoders.py
Outdated
| warmup_model( | ||
| model, input_ids, max_new_tokens, compile_dynamic_sendnn, **extra_kwargs | ||
| ) | ||
| def _get_aiu_model(model_path, gptq_kwargs, persistent_model_inst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the persistent model calling this in the current version with get_or_create. Is there a specific reason we moved this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main the reason was the cache test, because the branch I based it off of was not using the persistent model fixture, and I wanted to avoid changing the tests too much while cleaning them up, since I also wasn't very familiar with what they were testing. I agree and put it back to just use get_or_create though, and will just use that in the cache test also!
tests/models/test_decoders.py
Outdated
| return cpu_validation_info | ||
|
|
||
| # Don't save iter 0 for AIU only | ||
| skip_save = device == "aiu" and token_iter == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we are supposed to save every iteration here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, removed it!
2e42e7c to
1e369b2
Compare
JRosenkranz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
bot:test |
tharapalanivel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need another rebase and lint fixes but lgtm once the bot tests also pass, thanks @alex-jw-brooks!
Signed-off-by: Avery Blanchard <[email protected]> Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
commit ed571f728a351f8dd92737be5554c3dc46f71a30 Author: Alex-Brooks <[email protected]> Date: Tue Jul 29 09:20:06 2025 -0600 Remove cache tests commit 2848f7b2785b91c60c536b8993c3193c40c381ea Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 08:07:01 2025 -0600 Add leading underscores, revert model name commit c30b7b70a0f6e464d3212fd9bed4f9ea33f9de93 Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 07:15:08 2025 -0600 Explictly clear cache paths commit 42aaf666d7f8ffb2fb611df7ad2d06b48e480dd7 Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 07:14:23 2025 -0600 Set the cache dir in conftest commit b978e7225f02bf1d9a5f7b919ca6cbe2ee8d641a Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 06:15:10 2025 -0600 run formatting commit 8d64df08333991927c45f9a982ddaf95f39c94cf Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 11:18:13 2025 -0600 refactor cache miss into fixture commit 0b524b8c818495cb646add2adfc27a2884ac8de5 Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 07:11:09 2025 -0600 Consolidate cache test with common commit d8a36d405a101e101ab9ede3b8d12fa3026cd01f Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 06:41:13 2025 -0600 Run cache test first commit 2efb797fb21587e9136b314c44ec56c658636826 Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 05:48:25 2025 -0600 Finish splitting out common shape test helpers commit 4ae73dea18848005f86d1c9bcdf29f153711330f Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 05:28:31 2025 -0600 refactor most of common shape test commit 083afdc3a468649ec4b0bbadc921d40b47e37498 Author: Alex-Brooks <[email protected]> Date: Thu Jul 24 14:08:20 2025 -0600 Move torch sendnn cache dir to common commit e9b576381a738c59f91d5fc904ceaa2a0e410864 Author: Alex-Brooks <[email protected]> Date: Thu Jul 24 14:02:06 2025 -0600 Use caps for constants, common post proc Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
1e369b2 to
dd355c1
Compare
|
bot:test |
|
bot:test |
Signed-off-by: Alex-Brooks <[email protected]>
|
bot:test |
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
|
bot:test |
tests/models/conftest.py
Outdated
|
|
||
| # NOTE: we should configure the cachedir before importing torchsendnn's | ||
| # graph cache to prevent it from being initialized in the wrong place. | ||
| os.environ["TORCH_SENDNN_CACHE_DIR"] = os.path.join(os.getcwd(), ".cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a setdefault, this way it will only set it if a user did not already specify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Changed it
Signed-off-by: Alex-Brooks <[email protected]>
3471879 to
70e84f5
Compare
|
bot:test |
This PR builds on top of #20 to try to make the tests more reusable.
Summary of changes from the above branch are:
- Splits the common shapes test out into more understandable helpers that are then reused in the cache test in the follow-up PR
- Renames some stuff to better align with conventions