|
| 1 | +# CLIJ - GPU-accelerated image processing in ImageJ macro |
| 2 | +Image processing in modern GPUs allows for accelerating processing speeds massively. |
| 3 | +This page introduces how to do image processing in the graphics processing unit (GPU) using [OpenCL](https://www.khronos.org/opencl/) from ImageJ macro inside [Fiji](http://fiji.sc) using the [CLIJ](https://clij.github.io/) library. |
| 4 | +It is not necessary to learn OpenCL itself. |
| 5 | +Preprogrammed routines are supposed to do GPU image processing for you with given ImageJ macro programming experience. |
| 6 | +The list of preprogrammed routines might be extended depending on the communities needs. |
| 7 | + |
| 8 | +This is how your code might look like if you do GPU based image processing in ImageJ macro: |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | +## Installation |
| 13 | +Follow the [installation instructions](installationInFiji); |
| 14 | + |
| 15 | +## Activating CLIJs macro extensions |
| 16 | + |
| 17 | +To get started, all ImageJ macros using CLIJ have a line like this at the beginning: |
| 18 | + |
| 19 | +```java |
| 20 | +run("CLIJ2 Macro Extensions", "cl_device="); |
| 21 | +``` |
| 22 | + |
| 23 | +Note: This line may contain the specific name for a GPU. |
| 24 | +You can - but you don't have to - specify one. |
| 25 | +If none is specified, the system will take the first one found. |
| 26 | +If you don't have the named GPU in your computer, another one will be chosen. |
| 27 | +You don't have to enter the full name, you can also just specify a part of the name. |
| 28 | +In order to run on any `HD` named GPU, change the macro like this: |
| 29 | + |
| 30 | +```java |
| 31 | +run("CLIJ2 Macro Extensions", "cl_device=HD"); |
| 32 | +``` |
| 33 | + |
| 34 | +## Transferring images between ImageJ and the GPU |
| 35 | +In order to allow images to be processed by the GPU, you need to transfer them into the memory of the GPU. |
| 36 | +In order to view images which were processed by the GPU, you need to transfer them back to ImageJ. |
| 37 | +The two methods for doing this are called `push(image)` and `pull(image)`. |
| 38 | +You can remove a single image from the GPUs memory by using the `release(image)` method. |
| 39 | +Finally, you can remove all images from the GPU with the `clear()` method. |
| 40 | +In case the destination image doesn't exist, it will be created automatically in the GPU. |
| 41 | +Just push an image _A_ to the GPU, process it with destination _B_ and afterwards, you can pull _B_ back from the GPU to ImageJ in order to show it. |
| 42 | + |
| 43 | +Let's have a look at an example which loads an image and blurs it using the push-pull mechanism. |
| 44 | + |
| 45 | +```java |
| 46 | +// Get test data |
| 47 | +run("T1 Head (2.4M, 16-bits)"); |
| 48 | +input = getTitle(); |
| 49 | +getDimensions(width, height, channels, slices, frames); |
| 50 | + |
| 51 | +// Init GPU |
| 52 | +run("CLIJ2 Macro Extensions", "cl_device="); |
| 53 | +Ext.CLIJ_clear(); |
| 54 | + |
| 55 | +// push images to GPU |
| 56 | +Ext.CLIJ2_push(input); |
| 57 | + |
| 58 | +// cleanup ImageJ |
| 59 | +run("Close All"); |
| 60 | + |
| 61 | +// Blur in GPU |
| 62 | +Ext.CLIJ2_gaussianBlur3D(input, blurred, 10, 10, 1); |
| 63 | + |
| 64 | +// Get results back from GPU |
| 65 | +Ext.CLIJ2_pull(blurred); |
| 66 | + |
| 67 | +// Cleanup by the end |
| 68 | +Ext.CLIJ2_clear(); |
| 69 | +``` |
| 70 | + |
| 71 | +To find out, which images are currently stored in the GPU, run the `Ext.CLIJ2_reportMemory();` method. |
| 72 | + |
| 73 | +## Sparing time with GPU based image processing |
| 74 | +The overall goal for processing images in the GPU is sparing time. |
| 75 | +GPUs can process images faster because they can calculate pixel values of many pixels in parallel. |
| 76 | +Furthermore, images in memory of modern GPUs can be accessed faster than in ImageJ. |
| 77 | +However, there is a drawback: pushing/pulling the images to/from the GPU takes time. |
| 78 | +Thus, overall efficiency can only be achieved if whole pipelines are processed in the GPU. |
| 79 | +Furthermore, repeatedly using the same operations on a GPU pays off because operations are cached. Reusing them is faster than using other methods. |
| 80 | + |
| 81 | +Let's compare the `Mean 3D` filter of ImageJ with it's counterpart in CLIJ. |
| 82 | +The example macro is [benchmarking.ijm](https://github.com/clij/clij-docs/tree/master/src/main/macro/benchmarking.ijm). |
| 83 | +It executes both operations ten times and measures the time each operation takes. |
| 84 | +This is just an excerpt of the macro: |
| 85 | + |
| 86 | +```java |
| 87 | +// Local mean filter in CPU |
| 88 | +for (i = 1; i <= 10; i++) { |
| 89 | + time = getTime(); |
| 90 | + run("Mean 3D...", "x=3 y=3 z=3"); |
| 91 | + print("CPU mean filter no " + i + " took " + (getTime() - time)); |
| 92 | +} |
| 93 | +``` |
| 94 | + |
| 95 | +```java |
| 96 | +// push images to GPU |
| 97 | +time = getTime(); |
| 98 | +Ext.CLIJ2_push(input); |
| 99 | +Ext.CLIJ2_push(blurred); |
| 100 | +print("Pushing two images to the GPU took " + (getTime() - time) + " msec"); |
| 101 | + |
| 102 | +// Local mean filter in GPU |
| 103 | +for (i = 1; i <= 10; i++) { |
| 104 | + time = getTime(); |
| 105 | + Ext.CLIJ2_mean3DBox(input, blurred, 3, 3, 3); |
| 106 | + print("GPU mean filter no " + i + " took " + (getTime() - time)); |
| 107 | +} |
| 108 | + |
| 109 | +// Get results back from GPU |
| 110 | +time = getTime(); |
| 111 | +Ext.CLIJ2_pull(blurred); |
| 112 | +print("Pullning one image from the GPU took " + (getTime() - time) + " msec"); |
| 113 | +``` |
| 114 | + |
| 115 | +When executing the macro on an Intel Core i7-8565U CPU with a built-in Intel UHD Graphics 620 GPU (Windows 10, 64 bit), the output is: |
| 116 | + |
| 117 | +```java |
| 118 | +CPU mean filter no 1 took 3907 msec |
| 119 | +CPU mean filter no 2 took 3664 msec |
| 120 | +CPU mean filter no 3 took 3569 msec |
| 121 | +CPU mean filter no 4 took 3414 msec |
| 122 | +CPU mean filter no 5 took 2325 msec |
| 123 | +CPU mean filter no 6 took 2752 msec |
| 124 | +CPU mean filter no 7 took 2395 msec |
| 125 | +CPU mean filter no 8 took 2633 msec |
| 126 | +CPU mean filter no 9 took 2543 msec |
| 127 | +CPU mean filter no 10 took 2610 msec |
| 128 | +Pushing one image to the GPU took 11 msec |
| 129 | +GPU mean filter no 1 took 489 msec |
| 130 | +GPU mean filter no 2 took 27 msec |
| 131 | +GPU mean filter no 3 took 27 msec |
| 132 | +GPU mean filter no 4 took 28 msec |
| 133 | +GPU mean filter no 5 took 29 msec |
| 134 | +GPU mean filter no 6 took 39 msec |
| 135 | +GPU mean filter no 7 took 34 msec |
| 136 | +GPU mean filter no 8 took 29 msec |
| 137 | +GPU mean filter no 9 took 30 msec |
| 138 | +GPU mean filter no 10 took 31 msec |
| 139 | +Pulling one image from the GPU took 47 msec |
| 140 | +``` |
| 141 | + |
| 142 | +Thus, on the **CPU it takes 30 seconds**, while using the **GPU it just takes 0.8 seconds**. Let's execute it again. |
| 143 | + |
| 144 | +```java |
| 145 | +CPU mean filter no 1 took 2254 msec |
| 146 | +CPU mean filter no 2 took 2187 msec |
| 147 | +CPU mean filter no 3 took 2264 msec |
| 148 | +CPU mean filter no 4 took 2491 msec |
| 149 | +CPU mean filter no 5 took 2915 msec |
| 150 | +CPU mean filter no 6 took 2299 msec |
| 151 | +CPU mean filter no 7 took 2401 msec |
| 152 | +CPU mean filter no 8 took 2441 msec |
| 153 | +CPU mean filter no 9 took 2493 msec |
| 154 | +CPU mean filter no 10 took 2588 msec |
| 155 | +Pushing one image to the GPU took 9 msec |
| 156 | +GPU mean filter no 1 took 30 msec |
| 157 | +GPU mean filter no 2 took 28 msec |
| 158 | +GPU mean filter no 3 took 30 msec |
| 159 | +GPU mean filter no 4 took 39 msec |
| 160 | +GPU mean filter no 5 took 34 msec |
| 161 | +GPU mean filter no 6 took 34 msec |
| 162 | +GPU mean filter no 7 took 34 msec |
| 163 | +GPU mean filter no 8 took 32 msec |
| 164 | +GPU mean filter no 9 took 40 msec |
| 165 | +GPU mean filter no 10 took 32 msec |
| 166 | +Pulling one image from the GPU took 43 msec |
| 167 | +``` |
| 168 | + |
| 169 | +On the **CPU it still takes 24 seconds**, while using the **GPU it goes down to 0.4 seconds**. |
| 170 | +The additional speedup comes from the caching mechanism mentioned above. |
| 171 | + |
| 172 | +**Heureka, we can spare 90% of the time by executing the operation on the GPU!** |
| 173 | +And this works on a small laptop without dedicated GPU. This example should just motivate you to test your workflow on a GPU and guide you how to evaluate its performance. |
| 174 | + |
| 175 | +Side note: ImageJs mean filter runs _inplace_. That means the result is stored in the same memory as the input image. |
| 176 | +With every iteration in the for loop, the image becomes more and more blurry. |
| 177 | +The OpenCL operation in the GPU always starts from the _input_ image and puts its result in the _blurred_ image. |
| 178 | +Thus, the resulting images will look different. |
| 179 | +Be a sceptical scietist when processing images in the GPU. |
| 180 | +Check that the workflow is indeed doing the right thing. |
| 181 | +This is especially important when working with experimental software. |
| 182 | + |
| 183 | +This is the view on results from the mean filter on CPU and GPU together with the difference image of both: |
| 184 | + |
| 185 | + |
| 186 | + |
| 187 | +In presented case, have a look at [mean.ijm](https://github.com/clij/clij-docs/blob/master/src/main/macro/mean.ijm) to see how different the results from CPU and GPU actually are. |
| 188 | +In some of the filters, I observed small differences between ImageJ and OpenCL especially at the borders of the images. This is related to the fact that CLIJ contains new implementations of operations in ImageJ. There is a large number of [unit tests in the library](https://github.com/clij/clij/tree/master/src/test/java/net/haesleinhuepf/clij/macro/modules), |
| 189 | +ensuring these differences are small and in case they appear, they mostly influence image borders. |
| 190 | + |
| 191 | +[Back to CLIJ documentation](https://clij.github.io/) |
| 192 | + |
| 193 | +[Imprint](https://clij.github.io/imprint) |
0 commit comments