Skip to content
Hüseyin Tuğrul BÜYÜKIŞIK edited this page Jun 12, 2017 · 21 revisions

💩Kernel Level Pipelining

Using "event driven" and "driver controlled" (in-device many duplicated but partial kernels derived from same kernel) pipelining is out of spec of OpenCL but seems to be working for Amd and Intel GPUs for now(2017). When they are enabled(by adding a "true" boolean value as parameter to compute()), the API slices a region into N parts and reads/writes only that parts with clEnqueueRead/WriteBuffer commands per (sliced and offsetted)kernel. Multiple kernels writing(reading is ok) to different regions of a buffer at the same time and this is undefined behavior.

Tests that seemed to be working, have shown no error in data nor in OpenCL error code, yet. Anyway, use it at your own risk.

Driver - event pipelining page:

https://github.com/tugrul512bit/Cekirdekler/wiki/Pipelining

Since this is just a boolean switch, you can turn it off easily and trade some performance for stability.


🍏Device Level Pipelining

This does not overlap any two kernels with same buffer. So no bugs for single-device pipelining feature.


🍏System Level Pipelining

Since each device works in a different context(explicitly controlled), they don't have out-of-spec issues. Safe to use load balancing and device-to-device pipelining features.