Patch data iteration during inference to enable GPU pre / post processing #7440

Thibault-Pelletier · 2024-02-05T08:45:35Z

Thibault-Pelletier
Feb 5, 2024

Hi everyone,

First, thanks for the great library!

I had a question regarding inference optimization.

We often have the case where the hardware running the inference has enough GPU for the sliding window but not enough to fit the full volume to GPU. In this case, the preprocessing and post processing on CPU ends up taking more time than the actual sliding window inference.

I was wondering how I should go about to run the preprocessing on the GPU as well.

From my understanding, it should be possible to use something like the GridPatchDataset to iterate over patches, move the patch to GPU device and run the preprocessing on the patches before the actual inference, move back the patch to CPU for stiching afterwards.

The problems with this approach, from what I can see, are :

The patches affine are not updated to reflect the current patch iteration
The patch iteration doesn't handle possible overlap (which could be useful for stiching operation of multiple subpatches)
There is no stiching available for this kind of approach in MONAI (from what I can see)

So my questions are:

Are there already tools to enable this kind of workflow in MONAI (and I just missed it in)?
Is this approach a good idea overall / what would be the limitations to take into account ?
What would be the best way to implement it ?

Thanks in advance,
Best,
Thibault

KumoLiu · 2024-02-05T09:25:06Z

KumoLiu
Feb 5, 2024
Maintainer

Hi @Thibault-Pelletier, thanks for your interest here.
You can do several things:

You can set cache_rate to cache part of the data.

MONAI/monai/data/dataset.py

Line 777 in 33afaef

cache_rate: percentage of cached data in total, default is 1.0 (cache all).
You can set buffer_steps to set the number of sliding window iterations along the buffer_dim to be buffered on sw_device before writing to device.

MONAI/monai/inferers/utils.py

Line 122 in 33afaef

buffer_steps: the number of sliding window iterations along the ``buffer_dim``

We also have some plans to enhance the sliding_window_inference to improve the utilization efficiency of the GPU, but it's not on the agenda yet.

0 replies

Thibault-Pelletier · 2024-02-05T10:24:51Z

Thibault-Pelletier
Feb 5, 2024
Author

Hi @KumoLiu ,

Thanks for the feedback. What you are describing is already what we are doing.

My use case is more on optimizing what is done before and after the inference and not the inference process itself and for one image inference only (after the training and validation).

To take an example : If we have one volume as an input and we want to run the Whole Body CT inference on it, we would need to change the orientation and resample the volume before running a sliding windo inference on it. After the inference has run, we need to invert the previous resampling.

If the whole volume cannot fit the hardware, this pre/post processing is done on the CPU but the inference sliding window can be done on the GPU.

I was wondering how to run the resampling on the GPU as well by leveraging the same kind of mecanism as for the sliding window inference (ie running on patches).

Thanks in advance!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch data iteration during inference to enable GPU pre / post processing #7440

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Patch data iteration during inference to enable GPU pre / post processing #7440

Uh oh!

Thibault-Pelletier Feb 5, 2024

Replies: 2 comments

Uh oh!

KumoLiu Feb 5, 2024 Maintainer

Uh oh!

Thibault-Pelletier Feb 5, 2024 Author

Thibault-Pelletier
Feb 5, 2024

KumoLiu
Feb 5, 2024
Maintainer

Thibault-Pelletier
Feb 5, 2024
Author