Fix dir iteration being broken by concurrent removes #1068
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When removing a file, we mark all open handles as "removed" (pair={-1,-1}) to avoid trying to later read metadata that no longer exists. Unfortunately, this also includes open dir handles that happen to be pointing at the removed file, causing them to return LFS_ERR_CORRUPT on the next read.
The good news is this is not actual filesystem corruption, only a logic error in
lfs_dir_read
.We actually already have logic in place to nudge the dir to the next id, but it was unreachable with the existing logic. I suspect this worked at one point but was broken during a refactor due to lack of testing.
Fortunately, all we need to do is not clobber the handle if the internal type is a dir. Then the dir-nudging logic can correctly take over.
I've also added
test_dirs_remove_read
to test this and prevent another regression, adapted from tests provided by @tpwrules that identified the original bug.Found by @tpwrules
See #1061 for more info