Skip to content

Pipelining

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Mar 31, 2017 · 18 revisions

Pipelining is used when a system can work on different things concurrently and it hides latencies of some stages. In case of a GPU, there could be a buffer read, a buffer write and a kernel execution happening at the same time. For more advanced(new) GPUs, even multiple instances of same type operations can be executed. All these are harnessed by using multiple commandqueues for each opencl device. Even a cheap R7-240 AMD GPU can work with 16 queues concurrently in same context. This makes it easier to top its performance and get closer to its advertised performance limits.

Cekirdekler API supports two types of pipelining. One favors event-based control to overlap reads, writes and computes while the other favors driver-based control to overlap whole read+compute+write operations as blobs(seems to be more efficient since no event overhead exists).

To enable pipelining, some optional parameters are needed to be adjusted and partialRead flag must be set. If a workitem in a kernel accesses randomy to any element of array, it is not applicable for partial reading since there is no synchronization between workgroups in a kernel and pipelining already needs a divisible work such as adding 3 to all elements(embarrassingly parallel). But, if a workitem accesses to only its own workgroups' array range, it is applicable for pipelining no matter what array access pattern it is using.

Clone this wiki locally