-
Notifications
You must be signed in to change notification settings - Fork 5k
ZipFile.ExtractToDirectory failing with "The filename, directory name, or volume label syntax is incorrect." (works in SharpZipLib) #67201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsDescriptionI have some code using System.IO.Compression: ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly")) Reproduction Stepsputting this in a .fsx file and running it with #r "nuget: System.IO.Compression"
open System
open System.IO
open System.IO.Compression
open System.Net.Http
let directory = DirectoryInfo __SOURCE_DIRECTORY__
let archive = Path.Combine(__SOURCE_DIRECTORY__, "groove-v1.0.0-midionly.zip")
do
let uri = "https://storage.googleapis.com/magentadata/datasets/groove/groove-v1.0.0-midionly.zip"
use client = new HttpClient(Timeout = TimeSpan.FromMinutes 30)
use resp = client.Send(new HttpRequestMessage(RequestUri=Uri uri))
use stream = resp.Content.ReadAsStream()
use file = File.OpenWrite(archive)
stream.CopyTo(file)
//let fz = ICSharpCode.SharpZipLib.Zip.FastZip()
//fz.ExtractZip(archive, Path.Combine(directory.FullName, filesetName), "")
// dotnet one fails one the groove dataset.
ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly")) I tried on macos, and it doesn't seem to fail, so maybe it is platform dependent. Expected behaviorI wish it would work, extracting without an exception. Actual behavior
Regression?No response Known WorkaroundsI am using FastZip class from ICSharpCode.SharpZipLib.Zip to work around the issue. Configuration
Other informationNo response
|
Looks like there's a newline at the end of the path. That would cause this exception, too. |
Newlines in paths are valid in POSIX (although wierd). Looking at SharpZipLib, it does some cleaning of paths before writing them in Windows https://github.com/icsharpcode/SharpZipLib/blob/cc8dd78ed989888f6685da4cc009c529158738b4/src/ICSharpCode.SharpZipLib/Zip/WindowsNameTransform.cs#L29-L37 In our code see essentially no sanitization. I notice that this sanitization idea was discussed here #15938 (comment) and rejected, but nobody felt strongly. The idea was you could instead enumerate the zip entries manually and write them out with your own names. I think it is more useful for ExtractToDirectory to sanitize with underscore replacements, as 7zip and SharpZipLib behavior suggests. If someone didn't like the sanitization scheme, they could do it manually rather than using ExtractToDirectory. The work required is
|
@smoothdeveloper any interest in offering a PR for this? |
@danmoseley I would like to help, but this is my first time. Could you point me in the right direction to start? |
@Danyy427 for sure, I've broken out the steps above. First thing is to make a test zip and put it in the runtime-assets repo. Before you get started, let's just double check with the owners of this area: @dotnet/area-system-io-compression that they support this change. |
Meantime, you can follow the 'getting started' steps in this repo (linked from the README) to clone, build, and get tests passing for this repo without any changes. |
@danmoseley the original question is in F# but I may write code in C#, right? Also if I understood correctly I am going to make some tests where the characters that need sanitization are being used, and upload them to runtime-assets? Sure will read the "getting started" |
Yes we exclusively use C# in this repo (except when testing VB or F# scenarios.) Yes, you will want to create a little zip file that has all the wierd characters in its file names listed here. You will need to figure out how to do that, and it may involve (if you're on Windows) using Linux (via WSL) and the Unix 'zip' utility, for example. |
@danmoseley, thanks for the feedback on confirming the issue and the reason of it surfacing, sharing great insights. I can't help with a fix for it but thanks for the offer; I am also happy to see @danny427 is going to give it a try based on your guidance, and confirm it doesn't matter I was using it from F# code for purpose of the fix. Thanks all! |
@danmoseley I made that zip file with some test files/folders inside it. However, it is not possible to put '\0' in a filename even in Linux/UNIX. Should I attempt to write directly to my hard drive to put a null character or is it overkill/not necessary? Update: I have just tested extracting the said zip file with 7zip and indeed all of the characters are replaced with underscores. |
I wouldn't bother with \0 if it isn't obvious how someone would make such things. The concept here is that a zip that extracts successfully on Unix should extract successfully on Windows. It's not a goal to extract random synthetic zips. |
@danmoseley Ok, then how would I go about uploading the zip? Do I make a clone of runtime-assets, add the zip file and put up a pull request? |
@Danyy427 exactly, once you have tested it by extracting it yourself with sharpziplib and verified that it produces the same content you put in, with the expected chars replaced in the file names. |
@danmoseley will do, thank you! |
@danmoseley I have verified with SharpLibZip FastZip that the zip file results in all characters replaced with underscores. Creating a pull request now. |
Also of course that would be invalid even on Unix, so it's unlikely any mainstream tool would allow you to name streams that way, as they could never be written to a file under that name. |
) * Added Test Zip File (dotnet/runtime#67201) * Use descriptive name * Added Zip File containing files with characters invalid in Windows Co-authored-by: Dan Moseley <[email protected]>
@danmoseley How should I proceed now? Your original post suggests that I should wait for a PR and add tests. How would I go about adding tests and running them? (I won't be able to attend to this for a few hours since I will probably have to sleep (it's late here)) |
First you need to get this repo cloned, built, and existing tests passing. Start at https://github.com/dotnet/runtime/blob/main/docs/workflow/README.md -- follow instructions for libraries. |
@danny427 at first measure you may:
|
@Danyy427 once he has tests passing locally (with no changes) should follow these instructions https://github.com/dotnet/runtime-assets/blob/main/README.md#optional-step-local-testing in order to be able to add a new test that consumes his new zip file. |
@danmoseley I have been attempting to build the repo with the instructions but the build fails with a couple of errors such as this one:
I run the build command like this:
and I have all of the required components that are listed. I came across this page but I don't know how to fix it in the build per se. |
@danmoseley I have just finished the sanitization to |
It is always best to write the test(s) and verify it fails before attempting to fix anything. Primarily because that confirms you have written a useful test.
I didn't look at the code very hard 😄 It seems the purpose of that method is to find the "file part" of the entry. It's then exposed eg in ZipArchiveEntry.Name. We probably shouldn't "lie" there. My suggestion is to add the sanitization at the point the file has to be written. It looks like here, FullName is the whole relative path. Line 99 in dbef8db
One way to do this might be to add something like an "internal string SantizedFullName { get { .. }}" property (or maybe FileSystemSafeFullName?)to both ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs. On Unix it just returns FullName. On Windows it sanitizes it. Then on the line above, use SanitizedFullName instead of FullName. |
@danmoseley I have created a pull requests for the test. #67332 |
@danmoseley checks are done, some tests failed as expected |
@danmoseley In your last post, you said
I do not understand why |
@danmoseley Also should I build the When On the other hand if I declare
|
Ah. It looks like you can't piggy back in ZipArchiveEntry.Windows.cs (which is in System.IO.Compression) because you need it in ZipFileExtensions.ZipArchiveEntry.Extract.cs (which is in System.IO.Compression.ZipFile) Instead I suggest creating something like ZipFileExtensions.ZipArchiveEntry.Extract.Windows.cs and putting it next to ZipFileExtensions.ZipArchiveEntry.Extract.cs and editing System.IO.Compression.ZipFile.csproj so it's only used on Windows. Hmm, but that csproj does not have a Windows-specific build. It would need I'm not familiar with all these arrangements and why bits of zip are in S.IO.C and bits are in S.IO.C.Z.. @carlossanlop do you have a suggestion? |
@danmoseley Instead of adding |
Ideally we don't check OS at runtime. @ericstj what is our base for adding a windows target to this library? |
@danmoseley I don't know if it would work (I cannot test it right now) but we could maybe add |
If I'm understanding this correctly, you're trying to consume code from A great example of this practice are the |
@carlossanlop We are also trying to make some of the code work specifically on Windows and some of it on Unix, like |
Add runtime/src/libraries/System.IO.Compression.ZipFile/src/System.IO.Compression.ZipFile.csproj Line 4 in 631389f
and make sure you condition your new files on runtime/src/libraries/System.IO.Compression/src/System.IO.Compression.csproj Lines 47 to 50 in 20e8f90
|
@danmoseley I do not know how I should be using this knowledge. I tried to do something but it's just errors everywhere now :) I added two files System.IO.Compression.sln:
and to System.IO.Compression.ZipFile I first added
Finally changed the |
@Danyy427 can you push your changes to your branch, and one of us can take a look? |
@danmoseley I pushed my changes, note that some weird character was written at the start of the csproj files, probably something wrong with vs or notepad++, the last commit was to remove those |
@danmoseley I have also added sanitization. |
@danmoseley did you have a chance to look at the changes? |
@danmoseley Actually instead of trying to add a |
@carlossanlop do you have a preference? |
@danmoseley I was going to try out writing sanitization as a method and I did do it for windows and Unix but there seem to be 4 target platforms. It seems that Unix and Browser go together but what about net7.0 without any target platform specified. Is that supposed to have sanitization or not? |
@danmoseley I made the test that I have written pass locally using a method to sanitize the filename. Every other test fails miserably tho |
@danmoseley I have pushed the changes that made the local test pass, how should I deal with tests that expect the extraction to fail when there are invalid characters in the filename? |
You'd update those tests so they expect the new behavior. |
@danmoseley there are some tests that are made specifically to check for errors when they encounter invalid chars. Should I delete those tests as they are no longer needed or should I make sure all of the zips in their test cases are extracted correctly with |
@danmoseley I fixed all of the tests and they all passed on my local system. Instead of deleting the |
fixed by #67332 |
Description
I have some code using System.IO.Compression:
Reproduction Steps
putting this in a .fsx file and running it with
dotnet fsi
:I tried on macos, and it doesn't seem to fail, so maybe it is platform dependent.
Expected behavior
I wish it would work, extracting without an exception.
Actual behavior
Regression?
No response
Known Workarounds
I am using FastZip class from ICSharpCode.SharpZipLib.Zip to work around the issue.
Configuration
Other information
No response
The text was updated successfully, but these errors were encountered: