Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] add the ability to recover shuffling datasets in checkpoint. #247

Merged
merged 4 commits into from
Sep 14, 2024

Conversation

LongzhuoWang
Copy link
Contributor

[feature] add the ability to recover shuffling datasets in checkpoint.
[fix] bug fixed, epoch for load checkpoint cannot exceed the config one.

[fix] bug fixed, epoch for load checkpoint cannot exceed the config one.
@LongzhuoWang LongzhuoWang marked this pull request as draft September 9, 2024 02:49
@yezhengmao1
Copy link
Collaborator

check failed.

@yezhengmao1
Copy link
Collaborator

@yezhengmao1
Copy link
Collaborator

[feature] add cache deletion function.

[fix] Fixed bug and improved feature.
@LongzhuoWang LongzhuoWang marked this pull request as ready for review September 11, 2024 03:34
Copy link
Collaborator

@yezhengmao1 yezhengmao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor comments.

data = preprocess_func[preprocess_type](data)
# If data preprocess_type is shuffle, create a cache folder,
# to store shuffled data and use it for saving checkpoints.
if preprocess_type == "shuffle":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the function into line 97 or create a new function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to avoid using the "if" branch here because different process_type have different process functions defined in line 94.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me have a try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, this is indeed better. I have already modified the code.

@yezhengmao1 yezhengmao1 self-requested a review September 13, 2024 07:45
Copy link
Collaborator

@yezhengmao1 yezhengmao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yezhengmao1 yezhengmao1 merged commit e72fe4c into TUDB-Labs:main Sep 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants