Skip to content

Conversation

wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Jul 21, 2025

Integration Tests Restructuring

  • Split current integration tests into two sets:

    1. Depth Test - features.py: Use llama3 model, to test all the main components of torchtitan are functioning as expected
    2. Breath Test - models.py : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models.
  • Moved integration test files from the root directory to a dedicated tests/integration_tests/ directory

  • Added a base configuration file base_config.toml for integration tests, as most of the train_configs shared 90% same settings

  • Separate control logic and test case definition: run_tests.py for control logic, other files for test case definition.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 21, 2025
@wwwjn wwwjn changed the title Integration Tests Restructuring for extensible test cases. Modular Integration Test Framework with DeepSeek-v3 Support Jul 21, 2025
@wwwjn wwwjn marked this pull request as draft July 21, 2025 20:15
@wwwjn wwwjn changed the title Modular Integration Test Framework with DeepSeek-v3 Support [WIP] Modular Integration Test Framework with DeepSeek-v3 Support Jul 25, 2025
@wwwjn wwwjn marked this pull request as ready for review July 25, 2025 22:22
@wwwjn wwwjn changed the title [WIP] Modular Integration Test Framework with DeepSeek-v3 Support Modular Integration Test Framework with DeepSeek-v3 Support Jul 25, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great initiative! Left some initial comments, let's discuss.

@wwwjn wwwjn changed the title Modular Integration Test Framework with DeepSeek-v3 Support [Refactor] Modular Integration Test Framework with DeepSeek-v3 Support Jul 29, 2025
@wwwjn wwwjn force-pushed the ci-refactor branch 3 times, most recently from f94d708 to 5cf7850 Compare July 31, 2025 22:20
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we make model tests flat, and reuse functions such as main, run_tests, etc. across all tests -- basically decouple control logic and data.

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner. Had some more comments.

"""
key is the config file name and value is a list of OverrideDefinitions
that is used to generate variations of integration tests based on the
same root config file.
"""
integration_tests_flavors = defaultdict(list)
integration_tests_flavors["debug_model.toml"] = [
integration_tests_flavors = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not addressed

parser.add_argument(
"--test_suite",
default="",
choices=["features", "models", "h100"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because ft.py is special and hard to reconcile? I think that's fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ft.py use different command to run a single tests, so this is hard to reconcile.



def run_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):
def run_single_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to define such functions again? we can just put things like clip_encoder_version_arg into OverrideDefinitions

If the concern is repetition, you can define the common part as a list l (with better naming) and put it in every OverrideDefinitions using *l.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Flux model, the main concern is not repeatedly add configurations, but the command is different from the main run_single_tests() function. Flux is running using another ./run_train.py under flux folder, not the main ./run_train.py in torchtitan. The same thing happened for ft.py as well,ft.py also use a different command to run the tests.

Most of time the commands to run tests are the same, we always use main ./run_train.py with slightly modified configurations. So I think there's no need to over generalize the run_test() function. If the command is different, I re-defined the run_single_tests() function. And since the run_tests() functions calls run_single_tests() internally, so run_test() function is re-defined as well.

Would love to know your opinion on this design

)
else:
run_test(test_flavor, full_path, args.output_dir)
def run_tests(args, test_list: List[OverrideDefinitions]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse what's already in run_tests.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same rationale as above

@wwwjn wwwjn requested a review from tianyu-l August 27, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants