-
Notifications
You must be signed in to change notification settings - Fork 103
Open
Description
A performance issue when running many iterations of the gamma function is the internal allocation of a 30-length buffer.
A solution I've implemented is to overwrite the functions to use 1) StaticArrays.MVector
, and 2) A per-thread buffer pool. However, I don't know if this is the best solution.
const gamma_inc_taylor_buffers = map(1:nthreadlimit) do i
@MVector zeros(30)
end
function _gamma_inc_get_buffer()
buffer = gamma_inc_taylor_buffers[Threads.threadid()]
buffer .= 0
return buffer
end
# in functions
wk = _gamma_inc_get_buffer()
It would be better to have a solution in here, to avoid recompiling.
4.814791 seconds (14.04 M allocations: 3.806 GiB, 18.75% gc time) # Current
4.496342 seconds (14.04 M allocations: 3.217 GiB, 16.10% gc time) # `@MVector zeros(30)`
3.123370 seconds (869.47 k allocations: 78.309 MiB) # Buffer pool + MVector
Metadata
Metadata
Assignees
Labels
No labels