-
Notifications
You must be signed in to change notification settings - Fork 10
Bugs
💩Kernel Level Pipelining
Using "event driven" and "driver controlled" (in-device many duplicated but partial kernels derived from same kernel) pipelining is out of spec of OpenCL but seems to be working for Amd and Intel GPUs for now(2017). When they are enabled(by adding a "true" boolean value as parameter to compute()), the API slices a region into N parts and reads/writes only that parts with clEnqueueRead/WriteBuffer commands per (sliced and offsetted)kernel. Multiple kernels writing(reading is ok) to different regions of a buffer at the same time, is an undefined behavior.
Tests that seemed to be working, have shown no error in data nor in OpenCL error code, yet. Anyway, use it at your own risk.
Driver - event pipelining page:
https://github.com/tugrul512bit/Cekirdekler/wiki/Pipelining
Since this is just a boolean switch, you can turn it off easily and trade some performance for stability.
How else can I hide a single kernel's and a single buffer's latencies? Just tried some stairway(event) overlapping and it worked righaway for (Amd)HD7870, R7-240, RX-550, FX8150 and (Intel)HD400, N3060 then tried with "free" queues which made it even faster.
I noticed this "bug" after reading this page this stackoverflow question because no such explanation was present in this page and I joined Khronos forums very lately.
🍏Device Level Pipelining
This does not overlap any two kernels with same buffer(double buffering). So no bugs for single-device pipelining feature.
🍏System Level Pipelining
Since each device works in a different context(explicitly controlled), they don't have out-of-spec issues. Safe to use load balancing and device-to-device(again, uses double buffering.) pipelining features.
Please, if you ever see an error in logic or an undefined behavior, send me a mail:
huseyin (a dot here) tugrul (another dot char here) buyukisik (@ char as usual) gmail(dot)com