-
Notifications
You must be signed in to change notification settings - Fork 50
Compute unary and some binary functions faster for PooledDataArray #127
base: master
Are you sure you want to change the base?
Conversation
I think this is potentially a really good idea, but I'm a little troubled by breaking the invariant that the pool isn't redundant. Right now, this trick breaks that invariant whenever the applied operator isn't one-to-one. |
You mean, when the function is not pure? This issue was already raised when dealing with sparse arrays, about how structural zeros could be handled when calling e.g. |
I think the problem @johnmyleswhite is referring to is that, if you run |
Ah, OK. Basically any function which isn't an injection will have this problem. What you're saying is that even mathematically injective functions might not be injective in practice because of rounding? That would mean for all functions the pool must be checked for duplicates, and compacted if needed -- which comes with a large cost since the whole PDA will need to be adjusted. Hopefully in practice this situation should be rare enough. |
I actually only meant to refer to the lack of mathematical injectiveness, but @simonster's point that computational injectiveness is even more stringent is a really good one. For me, the trick here is that this pool request pushes hard on one conception of PDA's that I'd like to eventually see split into a separate package: a compressed array in which repeated values are not represented repeatedly. I'd call this a On the other hand, you have PDA's being used as |
|
OK, I agree that the injectiveness wasn't well thought out. I did leave out min(pda, ::Real) for that reason, and I also left out minimum(pda), because I wasn't sure if the maximum of the pool would actually also be used in the refs. (the pda could be a subset / slice, I assume you don't recompute the pool and refs for that). Well, this all started as a new type WarpedArray in MFCC, when I realized that the functionality might have been covered by PooledDataArray. But the use of such quantized values in an array is completely different from PooledDataArrays indeed. |
No pun intended. Just typing fast. I do think we should support something like UniqueArray. If we do, I think we need to formalize its semantics. Does a subset of a UniqueArray have a pool that guarantees uniqueness? Or does it guarantee connection to the pool for the whole? Is the pool always minimal? Can you include values in the pool that don't occur in the data? |
If the goal is to use numeric values in The question of whether to allow keeping unused values in the pool is less specific to |
…rom Julia (#127) Update splice!() to reflect changes in the corresponding Base function. This fixes the tests on recent Julia 0.5 master.
This PR implements unary and some binary functions more efficiently for pooleddataarray, by only operating on the .pool rather than all entries of the matrix. This is noticeable for very large arrays, e.g.,
using MFCC ## for warp(): this gives us quantized floating point values
x = warp(randn(100000,10)); ## a relatively slow operation, unfortunately
p = PooledDataArray(x)
@time exp(p);