Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PlexusIoZipFileResourceCollection performance by using JarFile #106

Merged
merged 1 commit into from
Dec 19, 2018
Merged

Improve PlexusIoZipFileResourceCollection performance by using JarFile #106

merged 1 commit into from
Dec 19, 2018

Conversation

plamentotev
Copy link
Member

Currently PlexusIoZipFileResourceCollection uses PlexusIoURLResource to get the InputStream of the JAR entries. PlexusIoURLResource uses URL and URLConnection to get the input stream. The problem is that they create a new JarFile for every entry and in some cases the JarFile initialization could be expensive (for example when he JAR is signed). Using the URLConnection cache would solve the performance issues but opens new one. The cache is global for the build so if the JAR file have changed during the build you may get the cached instance (see plexus-io#2).

Modify PlexusIoZipFileResourceCollection to use JarFile directly instead of using PlexusIoURLResource. That would solve solves the two issues - JarFile is initialized once so there is no performance penalty and it is local so if the file changed during the build the latest version will be picked up.

Under the hood PlexusIoURLResource uses JarFile as well and keeping ZipFileResource extending PlexusIoURLResource would both keep the behaviour and interface backward compatible.

@plamentotev
Copy link
Member Author

Hi, this is my attempt to fix codehaus-plexus/plexus-io#12 without steping onto codehaus-plexus/plexus-io#2. What do you think? Of course for me the long term solution would be to deprecate PlexusIoZipFileResourceCollection and implement #90 after we research is safe to do so.

@rbjorklin @vojtechhabarta would be great if you can confirm that this PR solves your performance issues.

@vojtechhabarta
Copy link

Yes, I can confirm that this change solves the issue with performance (6 minutes shortened to 30 seconds in our case).
Thank you very much!

BTW in addition to building this PR it was also needed to change and build maven-assembly-plugin because there is incompatible change in BaseFileSet interface (added getFileMappers method). Fortunately just return null in PrefixedFileSet and PrefixedArchivedFileSet was sufficient 😏.

@plamentotev
Copy link
Member Author

If there are no objections I plan to merge this PR.

@vojtechhabarta thank you for testing the patch. And for pointing that plexus-archiver 3.7.0 is not exactly backward compatible as it introduces new method in the interface - I've overlooked that. Do you think that we need to bump the major version? /cc @michael-o @krosenvold @khmarbaise

@michael-o
Copy link
Member

I'd bump it.

Currently `PlexusIoZipFileResourceCollection` uses `PlexusIoURLResource`
to get the `InputStream` of the JAR entries. `PlexusIoURLResource`
uses `URL` and `URLConnection` to get the input stream. The problem is
that they create a new `JarFile` for every entry and in some cases
the `JarFile` initialization could be expensive
(for example when he JAR is signed). Using the `URLConnection`
cache would solve the performance issues but opens new one.
The cache is global for the build so if the JAR file have changed
during the build you may get the cached instance (see plexus-io#2).

Modify `PlexusIoZipFileResourceCollection` to use `JarFile`
directly instead of using `PlexusIoURLResource`.
That would solve solves the two issues - `JarFile` is initialized once
so there is no performance penalty and it is local so if the file
changed during the build the latest version will be picked up.

Under the hood `PlexusIoURLResource` uses `JarFile` as well
and keeping `ZipFileResource` extending `PlexusIoURLResource` would
both keep the behaviour and interface backward compatible.

Closes #106
@plamentotev plamentotev merged commit 2ab62eb into codehaus-plexus:master Dec 19, 2018
@plamentotev plamentotev deleted the improve-jar-resource-collection-perf branch December 19, 2018 19:29
@plamentotev plamentotev added this to the 4.0.0 milestone Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants