Refactor Decoder Tests #93

alex-jw-brooks · 2025-07-28T14:19:11Z

This PR builds on top of #20 to try to make the tests more reusable.

Summary of changes from the above branch are:
- Splits the common shapes test out into more understandable helpers that are then reused in the cache test in the follow-up PR
- Renames some stuff to better align with conventions

tharapalanivel · 2025-07-30T07:02:33Z

tests/models/test_decoders.py

-if isinstance(common_batch_sizes, str):
-    common_batch_sizes = [int(bs) for bs in common_batch_sizes.split(",")]
+if isinstance(COMMON_BATCH_SIZES, str):
+    COMMON_BATCH_SIZES = [int(bs) for bs in COMMON_BATCH_SIZES.split(",")]


Not merged yet but fya get_env_to_int_list method from Gaurav's PR here would be helpful here

alex-jw-brooks · 2025-07-30T12:31:49Z

I split this PR in two to hopefully make it easier to review - this PR is now just the refactor to make things more reusable, the cache test is added in #97 in this commit ad3073c

JRosenkranz · 2025-07-30T17:53:45Z

tests/models/test_decoders.py

+    model,
+    micro_model_path,
+    validation_zero_info,
+):


Now that we are doing a restructuring and splitting up validation level 1 and 0, it might be a good opportunity to give a description here of what each validation level is doing. If not in this PR, we could do in a follow up PR

Hey @JRosenkranz, I rebased this PR and added a short description for level 0 / 1 for now. Happy to continue cleanup / add better docstrings in follow-up PRs as well 😄

JRosenkranz · 2025-07-30T17:56:53Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1,8 SEQUENCE_LENGTH=64,2048 USE_TINY_MODEL=1

JRosenkranz · 2025-08-05T00:25:55Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1,8 SEQUENCE_LENGTH=64,2048 USE_TINY_MODEL=0

tharapalanivel · 2025-08-20T02:55:49Z

tests/models/test_decoders.py

 )
-skip_assertions = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {})
-validation_info_dir = os.environ.get(
+SKIP_ASSERTIONS = os.environ.get("FMS_TEST_SHAPES_SKIP_ASSERTIONS", {})


Can we pull some of this env var setup outside of this script to use with the other pytests please?

Definitely! I'll do it in a different PR to try to keep things as isolated as possible here if that's ok 🙂

Totally ok and makes a lot of sense, thank you!

JRosenkranz · 2025-08-20T19:31:13Z

tests/models/test_decoders.py

-    model_path, batch_size, seq_length, max_new_tokens, persistent_model
+##### Common utils
+# metric calculator based on the cross-entropy and mean diff for each decode step
+def _metric_calculator(r: torch.Tensor, t: torch.Tensor):


I believe we use this in a few places, not necessarily for this PR but we might want to move this out into a utility

Yup, looks like it! I will open some other cleanup PRs for stuff like this / clean up some of the env var stuff @tharapalanivel had asked for since this one is already a lot to look at

JRosenkranz · 2025-08-21T00:07:49Z

tests/models/test_decoders.py

-    warmup_model(
-        model, input_ids, max_new_tokens, compile_dynamic_sendnn, **extra_kwargs
-    )
+def _get_aiu_model(model_path, gptq_kwargs, persistent_model_inst):


I think I prefer the persistent model calling this in the current version with get_or_create. Is there a specific reason we moved this?

The main the reason was the cache test, because the branch I based it off of was not using the persistent model fixture, and I wanted to avoid changing the tests too much while cleaning them up, since I also wasn't very familiar with what they were testing. I agree and put it back to just use get_or_create though, and will just use that in the cache test also!

JRosenkranz · 2025-08-21T00:23:03Z

tests/models/test_decoders.py

+            return cpu_validation_info
+
+    # Don't save iter 0 for AIU only
+    skip_save = device == "aiu" and token_iter == 0


I believe we are supposed to save every iteration here

Sounds good, removed it!

JRosenkranz

lgtm

JRosenkranz · 2025-09-18T14:30:56Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4

tharapalanivel

Will need another rebase and lint fixes but lgtm once the bot tests also pass, thanks @alex-jw-brooks!

Signed-off-by: Avery Blanchard <[email protected]> Signed-off-by: Alex-Brooks <[email protected]>

Signed-off-by: Alex-Brooks <[email protected]>

commit ed571f728a351f8dd92737be5554c3dc46f71a30 Author: Alex-Brooks <[email protected]> Date: Tue Jul 29 09:20:06 2025 -0600 Remove cache tests commit 2848f7b2785b91c60c536b8993c3193c40c381ea Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 08:07:01 2025 -0600 Add leading underscores, revert model name commit c30b7b70a0f6e464d3212fd9bed4f9ea33f9de93 Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 07:15:08 2025 -0600 Explictly clear cache paths commit 42aaf666d7f8ffb2fb611df7ad2d06b48e480dd7 Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 07:14:23 2025 -0600 Set the cache dir in conftest commit b978e7225f02bf1d9a5f7b919ca6cbe2ee8d641a Author: Alex-Brooks <[email protected]> Date: Mon Jul 28 06:15:10 2025 -0600 run formatting commit 8d64df08333991927c45f9a982ddaf95f39c94cf Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 11:18:13 2025 -0600 refactor cache miss into fixture commit 0b524b8c818495cb646add2adfc27a2884ac8de5 Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 07:11:09 2025 -0600 Consolidate cache test with common commit d8a36d405a101e101ab9ede3b8d12fa3026cd01f Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 06:41:13 2025 -0600 Run cache test first commit 2efb797fb21587e9136b314c44ec56c658636826 Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 05:48:25 2025 -0600 Finish splitting out common shape test helpers commit 4ae73dea18848005f86d1c9bcdf29f153711330f Author: Alex-Brooks <[email protected]> Date: Fri Jul 25 05:28:31 2025 -0600 refactor most of common shape test commit 083afdc3a468649ec4b0bbadc921d40b47e37498 Author: Alex-Brooks <[email protected]> Date: Thu Jul 24 14:08:20 2025 -0600 Move torch sendnn cache dir to common commit e9b576381a738c59f91d5fc904ceaa2a0e410864 Author: Alex-Brooks <[email protected]> Date: Thu Jul 24 14:02:06 2025 -0600 Use caps for constants, common post proc Signed-off-by: Alex-Brooks <[email protected]>

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks · 2025-10-03T13:28:21Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4

Abhishek-TAMU · 2025-10-03T16:53:28Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4 AIU_TESTS_GIT_COMMIT=fix_pr_bot

tests/models/test_decoders.py

Signed-off-by: Alex-Brooks <[email protected]>

Abhishek-TAMU · 2025-10-06T19:26:11Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks · 2025-10-08T15:12:00Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4

JRosenkranz · 2025-10-08T17:25:37Z

tests/models/conftest.py


+    # NOTE: we should configure the cachedir before importing torchsendnn's
+    # graph cache to prevent it from being initialized in the wrong place.
+    os.environ["TORCH_SENDNN_CACHE_DIR"] = os.path.join(os.getcwd(), ".cache")


should this be a setdefault, this way it will only set it if a user did not already specify it?

Good point! Changed it

Signed-off-by: Alex-Brooks <[email protected]>

JRosenkranz · 2025-10-08T20:48:02Z

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1 SEQUENCE_LENGTH=2048 USE_TINY_MODEL=1 NUM_AIU=4

tharapalanivel reviewed Jul 30, 2025

View reviewed changes

alex-jw-brooks force-pushed the test_cache_refactor branch from b6e36d4 to 1fdea45 Compare July 30, 2025 12:22

alex-jw-brooks changed the title ~~Add Cache Test / Refactor Decoder Tests~~ Refactor Decoder Tests Jul 30, 2025

alex-jw-brooks marked this pull request as ready for review July 30, 2025 12:25

alex-jw-brooks mentioned this pull request Jul 30, 2025

Add Cache Miss/Hit Test #97

Open

alex-jw-brooks force-pushed the test_cache_refactor branch from 1fdea45 to 9ef02a9 Compare July 30, 2025 12:29

JRosenkranz reviewed Jul 30, 2025

View reviewed changes

alex-jw-brooks mentioned this pull request Jul 30, 2025

Add Cache Miss/Hit Test alex-jw-brooks/aiu-fms-testing-utils#1

Open

alex-jw-brooks force-pushed the test_cache_refactor branch 2 times, most recently from b6e36d4 to d2551b9 Compare August 12, 2025 13:56

alex-jw-brooks requested a review from JRosenkranz August 12, 2025 15:33

tharapalanivel reviewed Aug 20, 2025

View reviewed changes

JRosenkranz reviewed Aug 20, 2025

View reviewed changes

JRosenkranz reviewed Aug 21, 2025

View reviewed changes

alex-jw-brooks force-pushed the test_cache_refactor branch 2 times, most recently from 2e42e7c to 1e369b2 Compare September 11, 2025 12:10

JRosenkranz approved these changes Sep 18, 2025

View reviewed changes

tharapalanivel approved these changes Sep 19, 2025

View reviewed changes

avery-blanchard and others added 6 commits October 3, 2025 12:22

Add test case for caching

d2c4b8b

Signed-off-by: Avery Blanchard <[email protected]> Signed-off-by: Alex-Brooks <[email protected]>

Update cache test, add validation for cached run

ff90bf5

Signed-off-by: Alex-Brooks <[email protected]>

don't skip save on aiu iter0

c2dba72

Signed-off-by: Alex-Brooks <[email protected]>

fix fp8 dtype, always use persistent model fixture

dcfa628

Signed-off-by: Alex-Brooks <[email protected]>

remove model path from get_cpu_model args

dd355c1

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks force-pushed the test_cache_refactor branch from 1e369b2 to dd355c1 Compare October 3, 2025 13:24

Abhishek-TAMU reviewed Oct 3, 2025

View reviewed changes

tests/models/test_decoders.py Show resolved Hide resolved

fix casing error

373939d

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks added 2 commits October 8, 2025 10:54

Rebase fixes, linting

fb4d8f3

Signed-off-by: Alex-Brooks <[email protected]>

fix input prep

d870a24

Signed-off-by: Alex-Brooks <[email protected]>

JRosenkranz reviewed Oct 8, 2025

View reviewed changes

use setdefault for torch sendnn cache dir

70e84f5

Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks force-pushed the test_cache_refactor branch from 3471879 to 70e84f5 Compare October 8, 2025 19:30

alex-jw-brooks requested a review from JRosenkranz October 8, 2025 19:30

JRosenkranz merged commit 281ff22 into foundation-model-stack:main Oct 9, 2025
3 checks passed

Refactor Decoder Tests #93

Refactor Decoder Tests #93

Uh oh!

Conversation

alex-jw-brooks commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Jul 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz commented Jul 30, 2025

Uh oh!

JRosenkranz commented Aug 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz left a comment

Choose a reason for hiding this comment

Uh oh!

JRosenkranz commented Sep 18, 2025

Uh oh!

tharapalanivel left a comment

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Oct 3, 2025

Uh oh!

Abhishek-TAMU commented Oct 3, 2025

Uh oh!

Uh oh!

Abhishek-TAMU commented Oct 6, 2025

Uh oh!

alex-jw-brooks commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alex-jw-brooks commented Jul 28, 2025 •

edited

Loading