refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135

imstevenpmwork · 2025-10-07T14:40:09Z

This PR steams from the conversation in: #1959

Rationale

Why is compression not critical at this step?
We aim to preserve as much raw image information as possible, as these images are intermediate artifacts. They will later be compressed during video encoding at the end of each episode, where compression efficiency potentially matters more.

How was the compression level chosen?
The optimal compression level depends on the entropy characteristics of the images. However, since our main goal here is speed rather than file size, a low compression level is preferred to minimize CPU overhead during frequent writes.

Why compress_level=1 instead of 0?
Although 0 uses the least CPU for compression, it can paradoxically result in slower overall performance due to the larger output files. Writing significantly larger files increases I/O time, often offsetting any CPU gains.
Setting compress_level=1 provides a better balance between CPU usage and disk throughput.

Future Work

As suggested in the original ticket, compression and encoding parameters (e.g., format, compression level, codec options) should eventually be exposed to users for fine-grained control. This will be addressed in a future PR; although it is not currently a priority.

… set it to 1

CarolinePascal

Just a small comment : wouldn't this be the perfect occasion to add a docstring to this method ? c:

imstevenpmwork · 2025-10-07T15:55:29Z

Just a small comment : wouldn't this be the perfect occasion to add a docstring to this method ? c:

Done in: d52473b

refactor(datasets): add compress_level parameter to write_image() and…

a27de54

… set it to 1

imstevenpmwork self-assigned this Oct 7, 2025

imstevenpmwork added enhancement Suggestions for new features or improvements dataset Issues regarding data inputs, processing, or datasets refactor Code cleanup or restructuring without changing behavior performance Issues aimed at improving speed or resource usage labels Oct 7, 2025

imstevenpmwork linked an issue Oct 7, 2025 that may be closed by this pull request

write_image() is slow due to default compress_level=6 #1959

Open

2 tasks

imstevenpmwork requested a review from CarolinePascal October 7, 2025 14:58

CarolinePascal reviewed Oct 7, 2025

View reviewed changes

docs(dataset): add docs to write_image()

d52473b

imstevenpmwork requested a review from CarolinePascal October 7, 2025 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135

refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135

imstevenpmwork commented Oct 7, 2025 •

edited

Loading

Uh oh!

CarolinePascal left a comment

Uh oh!

imstevenpmwork commented Oct 7, 2025

Uh oh!

Uh oh!

refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135

Are you sure you want to change the base?

refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135

Conversation

imstevenpmwork commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Future Work

Uh oh!

CarolinePascal left a comment

Choose a reason for hiding this comment

Uh oh!

imstevenpmwork commented Oct 7, 2025

Uh oh!

Uh oh!

imstevenpmwork commented Oct 7, 2025 •

edited

Loading