Skip to content

wasteful re-compression #543

@gaponenko

Description

@gaponenko

Hello,

I have just waited for several minutes as jobsub_submit was
re-compressing an already compressed code tarball. It was specified
with the

--tar_file_name dropbox:///pnfs/mu2e/resilient/users/gandr/gridexport/tmp.9I7Gv1adwT/Code.tar.bz

option, and then I saw a large file named Code.tar.bz2473.tbz2 appear
in my working directory as I was waiting for the submission to
complete.

Maybe the compression step should be delegated to the user, and
jobsub_submit should not try to re-pack the user-provided file. Just
upload it as is from its original location.

Andrei

Activity

marcmengel

marcmengel commented on Feb 28, 2024

@marcmengel
Contributor

What jobsub_lite is doing is rewriting the tarfile with the permissions modified, to prevent people putting things into cvmfs that they cannot read.
See: https://github.com/fermitools/jobsub_lite/blob/master/lib/tarfiles.py#L83
The generated tarfile is compressed just to minimize the disk required.

gaponenko

gaponenko commented on Feb 28, 2024

@gaponenko
Author
marcmengel

marcmengel commented on Feb 29, 2024

@marcmengel
Contributor

Can it check that the provided file has proper permissions and complain and stop if not? Let the user fix their problems instead of trying to do this for them, as it penalizes other users. Andrei

We tried that, but users complained they were

  • making tarfiles of areas whose permissions they did not have permission to modify (with tardir:) , or
  • using tarfiles provided by others,
    and they found that behavior unacceptable.

Also, decompressing and reading the whole tarfile to check the permissions on everything is not significantly faster than copying it and modifying it.

Just how big is this tarfile you're sending?

marcmengel

marcmengel commented on Feb 29, 2024

@marcmengel
Contributor

Also, why are you asking that a file already in /pnfs/mu2e/resilient be re-copied to a dropbox: location, when it is already in resilient? Just leave the dropbox: off of the front and use it where it is...

gaponenko

gaponenko commented on Feb 29, 2024

@gaponenko
Author
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @marcmengel@gaponenko

        Issue actions

          wasteful re-compression · Issue #543 · fermitools/jobsub_lite