-
Notifications
You must be signed in to change notification settings - Fork 492
[Refactor] Modular Integration Test Framework with DeepSeek-v3 Support #1431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great initiative! Left some initial comments, let's discuss.
f94d708
to
5cf7850
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we make model tests flat, and reuse functions such as main
, run_tests
, etc. across all tests -- basically decouple control logic and data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much cleaner. Had some more comments.
tests/integration_tests/features.py
Outdated
""" | ||
key is the config file name and value is a list of OverrideDefinitions | ||
that is used to generate variations of integration tests based on the | ||
same root config file. | ||
""" | ||
integration_tests_flavors = defaultdict(list) | ||
integration_tests_flavors["debug_model.toml"] = [ | ||
integration_tests_flavors = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not addressed
parser.add_argument( | ||
"--test_suite", | ||
default="", | ||
choices=["features", "models", "h100"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because ft.py is special and hard to reconcile? I think that's fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ft.py
use different command to run a single tests, so this is hard to reconcile.
|
||
|
||
def run_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str): | ||
def run_single_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to define such functions again? we can just put things like clip_encoder_version_arg
into OverrideDefinitions
If the concern is repetition, you can define the common part as a list l
(with better naming) and put it in every OverrideDefinitions
using *l
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Flux model, the main concern is not repeatedly add configurations, but the command is different from the main run_single_tests()
function. Flux is running using another ./run_train.py
under flux folder, not the main ./run_train.py
in torchtitan. The same thing happened for ft.py as well,ft.py
also use a different command to run the tests.
Most of time the commands to run tests are the same, we always use main ./run_train.py
with slightly modified configurations. So I think there's no need to over generalize the run_test()
function. If the command is different, I re-defined the run_single_tests()
function. And since the run_tests()
functions calls run_single_tests()
internally, so run_test()
function is re-defined as well.
Would love to know your opinion on this design
) | ||
else: | ||
run_test(test_flavor, full_path, args.output_dir) | ||
def run_tests(args, test_list: List[OverrideDefinitions]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse what's already in run_tests.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same rationale as above
Integration Tests Restructuring
Split current integration tests into two sets:
features.py
: Use llama3 model, to test all the main components of torchtitan are functioning as expectedmodels.py
: As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models.Moved integration test files from the root directory to a dedicated
tests/integration_tests/
directoryAdded a base configuration file
base_config.toml
for integration tests, as most of the train_configs shared 90% same settingsSeparate control logic and test case definition:
run_tests.py
for control logic, other files for test case definition.