Skip to content

Conversation

tushar00jain
Copy link
Contributor

@tushar00jain tushar00jain commented Oct 8, 2025

Summary:
allow users to specify the profiler schedule
Summary:
the script adds configuration options to run training locally with ft enabled
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025
Summary:
Allows disabling the storage of checkpoints related to torchft.

Users don't really have to rely on any external storage. So it reduces set up time to get things up and running. Since we also don't really need model checkpoints when we have torchft. And if checkpoint storage has issues, this can work as a killswitch to completely disable the storage so it doesn't impact training.
Summary:
record the profile trace if the training process receives SIGABRT e.g. when Process Group watchdog aborts the process
tushar00jain added a commit that referenced this pull request Oct 10, 2025
Summary:
the script adds configuration options to run training locally with ft
enabled

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/pytorch/torchtitan/pull/1812).
* #1840
* #1811
* #1810
* __->__ #1812
* #1809

---------

Co-authored-by: Tushar Jain <[email protected]>
githubsgi pushed a commit to githubsgi/torchtitan that referenced this pull request Oct 13, 2025
Summary:
the script adds configuration options to run training locally with ft
enabled

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/pytorch/torchtitan/pull/1812).
* pytorch#1840
* pytorch#1811
* pytorch#1810
* __->__ pytorch#1812
* pytorch#1809

---------

Co-authored-by: Tushar Jain <[email protected]>
githubsgi pushed a commit to githubsgi/torchtitan that referenced this pull request Oct 15, 2025
Summary:
the script adds configuration options to run training locally with ft
enabled

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/pytorch/torchtitan/pull/1812).
* pytorch#1840
* pytorch#1811
* pytorch#1810
* __->__ pytorch#1812
* pytorch#1809

---------

Co-authored-by: Tushar Jain <[email protected]>
githubsgi pushed a commit to githubsgi/torchtitan that referenced this pull request Oct 16, 2025
Summary:
the script adds configuration options to run training locally with ft
enabled

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/pytorch/torchtitan/pull/1812).
* pytorch#1840
* pytorch#1811
* pytorch#1810
* __->__ pytorch#1812
* pytorch#1809

---------

Co-authored-by: Tushar Jain <[email protected]>
githubsgi pushed a commit to githubsgi/torchtitan that referenced this pull request Oct 16, 2025
Summary:
the script adds configuration options to run training locally with ft
enabled

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/pytorch/torchtitan/pull/1812).
* pytorch#1840
* pytorch#1811
* pytorch#1810
* __->__ pytorch#1812
* pytorch#1809

---------

Co-authored-by: Tushar Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant