There's a whole menagerie of vmap-related functions exported, corresponding to mutating/non-mutating, temporal/non-temporal stores, and singlethreaded/multithreaded. Eight versions of the same function are maintainable, but if some other boolean switch is added, that's sixteen, which could get unwieldy. Wrapper types might allow for a more Julian interface - something like
map!(Threaded(+), Nontemporal(out), a, b)
to replace
The introduction of Threaded(f) could be extended to reduce & mapreduce for a simple multithreaded reduction interface.