Implement lazy loading of thumbnails #665

buzzingwires · 2023-07-14T10:46:47Z

Now for one of the more ambitious changes discussed in #651 The general goal is to allow thumbnails to be displayed in a lazy-loaded manner, and unloaded when they fall out of a certain range. This can save memory and improve startup times at the cost of potentially higher disk IO and CPU usage when browsing thumbnails.

The number of thumbnails displayed at any given time is configurable to this end by the following options. The default values should maintain the original behavior.

thumbnail.load_behind : Max thumbnails to render behind the selected one. -1 (default) is no limit
thumbnail.load_ahead : Max thumbnails to render ahead of the selected one. -1 (default) is no limit
thumbnail.unload_threshold : Don't unload thumbnails unless more are loaded than this. -1 (default) is no limit

In order to reduce memory usage, the new 'thumbnail.max_ahead' and 'thumbnail.max_behind' can be used to specify a limit to the number of thumbnails after the current selection that can be loaded. Thumbnails outside of this range will be unloaded, and will be loaded again when they enter the range. Setting either of these variables to 0 will remove the loading limit, to restore previous behavior.

The 'thumbnail.max_count' configuration setting prevents thumbnails from being unloaded if they are under this count, even if they fall outside the range specified by 'thumbnail.max_ahead' and 'thumbnail.max_behind'. If this value is 0, it is ignored. ignored.

max_behind -> load_behind max_ahead -> load_ahead max_count -> unload_threshold

…old`

karlch

Thanks a lot, this is excellent work, with really nice test-cases!

I did leave a few nit-picking comments, the one part that once again requires a bit of thinking is the threshold as it does seem to make the implementation a tad more clumsy. Would love to hear your thoughts 😊

Have a great week-end!

tests/end2end/features/thumbnail/test_thumbnaillazy.feature

vimiv/api/settings.py

vimiv/gui/thumbnail.py

vimiv/utils/thumbnail_manager.py

karlch · 2023-07-15T09:48:27Z

Besides the code comments, I have noticed some havoc, e.g. when scrolling fast (holding j or k) on a larger list of thumbnails, resulting in something like this (unload_threshold=0, load_*=5, so I would expect 5 ahead and 5 behind, nothing else):

Could you investigate?

buzzingwires · 2023-07-16T00:47:14Z

You're welcome! I'm glad you generally like how this is turning out. You have a good weekend, too!

I most-certainly will investigate the bug. I am not immediately able to reproduce it, but as mentioned in the review for one of the tests, I think there's some kind of race condition that could be causing this.

* Rename test_thumbnaillazy.feature -> thumbnaillazy.feature * Replace consecutive 'When' statements with 'Ands' in tests * More explicit description for thumbnail.unload_threshold * Rename range calculation functions - _first_index -> _first_rendered_index - _last_index -> _last_rendered_index - _index_range -> _rendered_index_range * Refactor code to return as soon as possible

Previously, icons were pruned after updating the position. Sometimes the thumbnail creation callback ran after this update finished, and some old thumbnails were thus not pruned. Fix this by pruning every time a thumbnail is created, instead of at position updates. The thumbnail pruning function optimized to avoid redundant prunes, to improve performance given the more numerous calls. With these fixes, the issue causing the tests to need to wait for thumbnails to load before changing settings is also gone.

buzzingwires · 2023-07-16T13:25:33Z

Think I fixed our race condition and understand why it happened. Fixing this also removes the need for the redundant check for rendered thumbnails in the tests. See 15610c3

buzzingwires · 2023-07-16T13:38:33Z

Hmmmn, now we have one of the tests passing on my machine, but failing on the remote. I think what's happening is that moving pruning inside the thumbnail creation callback is making the handling of the thumbnail.unload_threshold option nondeterministic. Weird thing is: Repeated runs on my machine seem to pass each time. I wonder if it's influenced by the number of threads.

buzzingwires · 2023-07-17T14:47:14Z

I haven't quite figured this out yet, but my findings thus far are:

Changing the number of threads (by means of pool.setMaxThreadCount()) does indeed affect which thumbnails are pruned.
For a given thread count, the threads pruned seem to remain the same across multiple tests.
There is only a difference when thumbnail.unload_threshold is nonzero.

My best guess is that this has something to do with the order in which thumbnail creation jobs are assigned to threads. Could possibly fix this by having some kind of signal that's sent when all the jobs are finished, to run the pruning function.

karlch · 2023-07-17T16:07:01Z

Quick thanks for all the work you are putting in and investigations you are doing! As a heads-up: I will unfortunately not be able to take a deeper look until the week-end, as I am fully busy with a business trip during the week.

buzzingwires · 2023-07-18T06:12:27Z

Leaving my notes here:

The number of images to be loaded being less than the number of threads set for the pool is what seems to cause this to happen. I believe Qt basically works by issuing a job to each thread and waiting for those jobs to finish before assigning new ones.

When navigating the thumbnails, async thumbnail load is called with a new set of thumbnails when the position changes. At this time, the current behavior is to clear the pool to remove queued threads. So, when the number of threads is less than the number of jobs, this means there are threads in the queue that will never be started, and thus, they will never be loaded and checked against the threshold.

I'm able to get deterministic behavior by not clearing the pool, but that will create a greater performance hit from always loading every queued thread. So, we have a variety of choices here, and I'm inclined to wonder whether this deterministic behavior is truly necessary.

As thumbnails can be passed in arbitrary order to create_thumbnails_async, sort them such that those closest to the currently selected thumbnail are queued first.

tests/end2end/features/thumbnail/thumbnaillazy.feature

vimiv/gui/thumbnail.py

karlch · 2023-07-25T20:47:53Z

So, I believe I finally understood what is happening, thanks for your work and explanations! I agree that it should not be deterministic which part is affected by the unload_threshold.

I however do think we should ensure that the two ahead and two behind are rendered. And this is perfectly the case.

I would rewrite the tests along:

Feature: Lazy load thumbnails

    Background:
        Given I open 10 images
        And I enter thumbnail mode

    Scenario: Eager loading (default behavior)
        When I run 5scroll right
        Then there should be 10 rendered thumbnails
        And the first index should be 0
        And the last index should be 9
        And thumbnails should load in order of closeness to selected
...
    Scenario: Display two thumbnails ahead and behind but don't unload less than six tumbnails
        When I run set thumbnail.load_ahead 2
        And I run set thumbnail.load_behind 2
        And I run set thumbnail.unload_threshold 6
        And I run 5scroll right
        Then there should be 6 rendered thumbnails
        And the first index should be 3
        And the last index should be 7
        And image_08.jpg should be in the rendered thumbnails
        And thumbnails should load in order of closeness to selected

I am not sure why you decided to explicitly avoid the trailing zeros, as it makes the ordering quite confusing, at least for me. I.e., image_1.jpg, image_10.jpg, image_2.jpg, and so forth.

As I am still unsure I understand the pruning and so forth correctly everywhere, I may need another deep-dive into the code, but really feel like we are converging towards the expected behaviour 😊

buzzingwires · 2023-07-27T21:39:25Z

Rewrote the tests as specified and ended up removing the test for a thumbnail at a specific index, because that's really the only part that's non-deterministic.

There wasn't really a great reason for stripping the trailing zeroes, to be honest. Somewhere in my brain at the time, I may had conceived it would be more concise to remove them, and I was trying to match another test case I was using as reference. Anyway, they're not stripped now.

Also added a little bit of code that allows the icon position update to happen if moving around by clicking, rather than by scrolling or using the keyboard.

And hey, thanks! This has been a slog, so I appreciate your being patient with me. Is there anything I could help clarify concerning the pruning algorithm?

karlch · 2023-08-05T09:17:19Z

Perfect, thank you so much! Definitely not a slog from your side, this is excellent, I have been a bit inconsistent in my reviews though 😄

The one thing I am currently not able to wrap my head around, is if we need to call the pruning once a single new thumbnail has been created. Probably, yes, but it does seem to cause some overhead, even when nothing is pruned. Couldn't pruning be done once scrolling? Sure, there may be short times with "too little" thumbnails, but those will be filled then again.

buzzingwires · 2023-08-05T09:56:45Z

Of course! I'm glad you're ultimately happy with the way this is turning out.

If I recall correctly, I originally tried to do the pruning after scrolling, but there was a race condition in which the scroll function would look for thumbnails to prune and then return before the new thumbnail was actually loaded. Since the new thumbnail wasn't loaded yet, old ones that should have been pruned sometimes weren't.

karlch · 2023-08-05T10:15:06Z

That's unfortunate. I also suppose it doesn't make sense to try to prune once the current set of thumbnails has been created? There doesn't seem to be something along QThreadPool.done to connect to, but we could possibly hack something ourselves, if and only if this implementation would work at all 😄

buzzingwires · 2023-08-05T10:18:20Z

15610c3

I also just tested moving the _prune_icons() call again just to make sure.

That said, _prune_icons() could perhaps be optimized further. A low-hanging fruit might be to skip if the size of _rendered_thumbnails is below the unload threshold, before trying to scan through all the out of range thumbnails.

The more complicated optimization might be to look through rendered thumbnails, and check whether they are in range, instead of scanning through out of range thumbnails and seeing if they're loaded.

As for the other option you just described, I tried something where I tried to only call _prune_icons once, by keeping track of how many thumbnails were left to load. It was really complicated and the behavior was not sufficiently consistent, though.

karlch · 2023-08-05T10:38:43Z

Thanks for the hint! I like the idea of the low-hanging fruit(s). Another one would be to skip the call completely, if the settings are at default (i.e. load_ahead and load_behind are not -1). It might even be nice to have this whole logic only enabled, once the settings are changed 🤔

But this is getting out-of-hand, and possibly too complicated alltogether. If you don't mind, I think I would merge this once all your commits are pushed (dict comprehension?) and we can continue thinking if we come up with something more efficient in another PR 😊

EDIT: FYI, will be offline for the rest of the day, hopefully back on this tomorrow

buzzingwires · 2023-08-05T11:16:40Z

By the time you sent your post, I honestly had already just about implemented the low-hanging fruit of checking if there were any thumbnails to be unloaded before scanning.

Your idea to check if load_ahead and load_behind are not -1 is already sort of implicitly done, in that retrieving the first or last rendered index will produce the first or last index of the list. So, when put into a range, it's empty and the loop should just quit as a result.

But I agree, this is starting to get into micro-optimization territory. I personally think the performance overhead should be fairly negligible like this.

Anyway, the dict comprehension has been pushed too, and we can merge the pull request here.

Though, there is a final bit of unrelated envelope pushing I feel compelled to mention: The way the pruning is done is from 'left to right'. When the unload threshold is exceeded there's a strong bias to pruning the thumbnails at the beginning of the list, before considering ones at the end of the list. It might be fairer to alternate between thumbnails at the beginning and end of the list, so either end is pruned roughly equally.

UPDATE: I also realized the loops that check each index could probably be prematurely terminated if the unload threshold is no longer exceeded. Should still be fine though, as this is mostly relevant to situations where a high unload threshold is set. If it's low most/all entries would still need to be scanned.

UPDATE II: The number of icons expected to be loaded based on a given range could also be checked. The first thumbnail being loaded should really clear what's outside of the range, making subsequent calls quit immediately and their overhead negligible under most circumstances.

karlch · 2023-08-16T06:43:07Z

Sorry for the delay ☹️

I finally got around to test the option of "only enabling pruning in case unload threshold > 0" in 1ecc088. Instead of having the check in each call, we only connect the _prune_icons part once the unload_threshold exceeds -1. Like this, the whole logic is never touched in the default case. What do you think of this approach in general? Now, one could take this a few steps further, but I am not sure it is needed. Anyway, for completeness:

Also check if either load_ahead or load_behind are > -1.
Disconnect, once set manually back to default.

I see were you are coming from with the bias pruning thumbnails from the beginning. So instead of pruning ahead and behind, one would need to prune something along (there is probably a better way):

    def _prune_icons(self) -> None:
        """Prune indexes before and after the current index."""
        if not self._prunable_thumbnails_exist():
            return
        idx_behind = range(0, self._first_rendered_index())
        idx_ahead = reversed(range(self._last_rendered_index() + 1, len(self._paths)))
        for idx in flatten(itertools.zip_longest(idx_behind, idx_ahead)):
            if idx is not None:  # None from zip_longest
                self._prune_index(idx)

With this general approach, one could probably also check _prunable_thumbnails_exist within the loop here, and return completely, instead of skipping each call to _prune_index?

UPDATE II: Not sure I got this, they would quit immediately as the first one has already cleared a sufficient amount, and thus rendered + remaining < unload_threshold?

buzzingwires · 2023-08-19T18:21:53Z

I'm sorry if I pulled this PR out longer than intended. Again, we could totally just merge. We might be focusing too much on code that is already acceptably performant.

Checking load_ahead or load_behind seems like an even more extreme case of this, since again, the loops should not actually run if they're negative, though the functions will still be called.

I did look at the code you proposed. It looks sensible (and more consistent, since I have no idea why I checked api.settings.thumbnail.unload_threshold.value == -1) and like everything is still called in the right order(as in _prune_icons after _on_thumbnail_created). There are just a few things I thought to change.

Here, there's record keeping done to both disconnect when necessary, and to avoid multiple consecutive connects (which can produce extra calls to _prune_icons when thumbnail.unload_threshold is changed.).

    def _maybe_connect_pruning(self, unload_threshold: int):
        """Only connect pruning in case unloading should be performed."""
        if unload_threshold > -1:
            if not self._pruning_connected:
                self._manager.created.connect(self._prune_icons)
                self._pruning_connected = True
        elif self._pruning_connected:
            self._manager.created.disconnect(self._prune_icons)
            self._pruning_connected = False

There's also another And I run set thumbnail.unload_threshold 0 in the Display two thumbnails ahead and all behind test scenario.

For fixing the bias, your code looks about correct to my initial observation and like it more or less produces the result I had in mind. I'd just want to test it. I probably don't have anything much better than that.

Finally, as for UPDATE II, honestly, I don't entirely remember what I meant. It's something I'd probably have to write a prototype for to properly conceptualize (and test) again, but I think I meant something that followed UPDATE I, if the length of _rendered_paths is less than the number of indexes to be scanned, by the pruning process, the loop could also quit early.

Yutsuten · 2024-04-28T16:41:04Z

I have a photos folder with 1500+ pictures, and without this change my vimiv would always freeze pretty bad. Even after a few minutes it wouldn't respond, so I kill it.

I tried merging this change with v0.9.0. And used this configuration:

[THUMBNAIL]
save = False
size = 256
load_behind = 30
load_ahead = 30
unload_threshold = -1

This is what makes sense to me. No freeze. I'm able to navigate the thumbnails with no issue. Excellent work done here, good job!! 👍

I'm going to use this for now on! 😄

buzzingwires added 14 commits July 12, 2023 22:30

Default behavior for thumbnail 'max_ahead' 'max_behind' and 'max_count'

7dd5f46

Spaces instead of tabs.

79e353a

Correctly handle thumbnail reordering.

2f9a510

Prune icons ahead of current from end instead of beginning.

2ecce8f

Rename config options.

f9ab545

max_behind -> load_behind max_ahead -> load_ahead max_count -> unload_threshold

Use -1 for no limit in load_behind, load_ahead and `unload_thresh…

fee08bd

…old`

Add indices parameter for tests of create_thumbnails_async

62e25f5

Use iterables instead of lists for indices of create_thumbnails_async

c477f60

Add tests for lazy thumbnail loading.

da9d1ff

Lint and fix whitespace in test.

3b259d6

Make the eager-loading test a little more consistent with the others.

5b7ac72

Update changelog

4f06d4d

karlch requested changes Jul 15, 2023

View reviewed changes

buzzingwires added 3 commits July 16, 2023 05:49

More precise description for thumbnail.unload_threshold

e44a938

buzzingwires added 3 commits July 18, 2023 02:25

Clear _rendered_paths in addition to _paths on clear()

8c3e234

Don't manually specify indices when testing async thumbnail loading.

8d9118e

Prioritize thumbnails closest to the current selection for loading.

bfbc8f6

As thumbnails can be passed in arbitrary order to create_thumbnails_async, sort them such that those closest to the currently selected thumbnail are queued first.

karlch reviewed Jul 25, 2023

View reviewed changes

tests/end2end/features/thumbnail/thumbnaillazy.feature Outdated Show resolved Hide resolved

karlch reviewed Jul 25, 2023

View reviewed changes

vimiv/gui/thumbnail.py Outdated Show resolved Hide resolved

karlch reviewed Jul 25, 2023

View reviewed changes

vimiv/gui/thumbnail.py Outdated Show resolved Hide resolved

buzzingwires added 4 commits July 27, 2023 15:09

Refactor, and don't look for specific loaded thumbnails in tests.

4b1eedb

Remove unnecessary icon position update.

46dbe08

Update icon position when clicking a thumbnail.

07c6c2d

Make _rendered_thumbnail_pairs more concise and (probably) performant.

08aa5bf

karlch mentioned this pull request Aug 5, 2023

Lazy Loading Overhaul #651

Draft

Use dict comprehension for _rendered_thumbnail_pairs

b976bab

Check if unload threshold is exceeded before scanning.

6b26060

Only connect pruning if unload_threshold != -1

1ecc088

jcjgraf mentioned this pull request Apr 30, 2024

Slow when opening pictures on slow file systems (e.g., remote file systems) #363

Open

Implement lazy loading of thumbnails #665

Are you sure you want to change the base?

Implement lazy loading of thumbnails #665

Uh oh!

Conversation

buzzingwires commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karlch commented Jul 15, 2023

Uh oh!

buzzingwires commented Jul 16, 2023

Uh oh!

buzzingwires commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buzzingwires commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buzzingwires commented Jul 17, 2023

Uh oh!

karlch commented Jul 17, 2023

Uh oh!

buzzingwires commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karlch commented Jul 25, 2023

Uh oh!

buzzingwires commented Jul 27, 2023

Uh oh!

karlch commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buzzingwires commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlch commented Aug 5, 2023

Uh oh!

buzzingwires commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlch commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buzzingwires commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karlch commented Aug 16, 2023

Uh oh!

buzzingwires commented Aug 19, 2023

Uh oh!

Yutsuten commented Apr 28, 2024

Uh oh!

Uh oh!

buzzingwires commented Jul 14, 2023 •

edited

Loading

buzzingwires commented Jul 16, 2023 •

edited

Loading

buzzingwires commented Jul 16, 2023 •

edited

Loading

buzzingwires commented Jul 18, 2023 •

edited

Loading

karlch commented Aug 5, 2023 •

edited

Loading

buzzingwires commented Aug 5, 2023 •

edited

Loading

buzzingwires commented Aug 5, 2023 •

edited

Loading

karlch commented Aug 5, 2023 •

edited

Loading

buzzingwires commented Aug 5, 2023 •

edited

Loading