Description
I'm trying to find and fix bottlenecks in libfreenect2.
One of the issues I found is that memcpy'ing the decoded image from TegraFrame.data
(which is effectively jpeg_decompress_struct dinfo.jpegTegraMgr->buff[0]
) is very slow. We're talking about around 55 ms at full optimization (-O3
) using clang++-3.8.
In contrast, if I try to allocate plain char arrays of the same size on the heap, fill them with random data, and then do memcpy, I get around 9.5 ms, which is not great, but still much better.
So something is very wonky about using plain memcpy on that chunk of memory allocated by the hardware accelerated library. There must be a way to get fast access to the memory, otherwise TegraJPEG seems pretty pointless (the slow read access would defeat the purpose of the fast decode). I hope somebody familiar with the TegraJPEG/gstreamer stuff can pick up and work on this issue.