-
Notifications
You must be signed in to change notification settings - Fork 39
Add CPUTuple <: AbstractCPU as new device type #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #131 +/- ##
==========================================
+ Coverage 85.45% 85.47% +0.02%
==========================================
Files 9 9
Lines 1409 1411 +2
==========================================
+ Hits 1204 1206 +2
Misses 205 205
Continue to review full report at Codecov.
|
|
So I know that you can't go from buffer = Ref{NTuple{N,T}}()
a = SArray{S,T,N,L}(buffer[])...does the dereferencing always reallocate that entire buffer to create |
No, sometimes LLVM is in a good mood. |
|
So is the issue that we want to be able to do something like... |
I'll redefine VectorizationBase.memory_reference to return a tuple containing "ptr" and memory, to something like @inline memory_reference(A::BitArray) = (Base.unsafe_convert(Ptr{Bit}, A.chunks), A.chunks)
@inline memory_reference(A::AbstractArray) = memory_reference(device(A), A)
@inline memory_reference(::CPUPointer, A) = (pointer(A), preserve_buffer(A))
@inline function memory_reference(::CPUTuple, A::AbstractArray{T}) where {T}
ref = Ref(A)
Base.unsafe_convert(Ptr{T}, ref), ref
endThen the suggested use would be something like: # either
ptra, presa = stridedpointer_and_buffer(A);
ptrb, presb = stridedpointer_and_buffer(B);
# or
(ptra,ptrb), (presa,presb) = groupedstridedpointer((A,B), (#= description of axis similarity =#));
GC.@preserve presa presb begin
# use ptra and ptrb
endThis would allow However, the I'll make a breaking change to |
Tokazama
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Related to #130
I'm open to suggestions on a better name, e.g., if someone were to define:
it should also be a
CPUTuple. So maybe something likeCPUStructis a better name? OrCPUStructMemory?Or
CPULLVMArray? Most homogenous tuples lower to LLVM Arrays.The idea is to represent something that should really be
CPUPointer, except for the niggling detail that Julia semantics don't allow us to get a pointer to it, while C(++) would allow us to just use the&operator to get the address.Julia's semantics actually more closely match LLVM there, and what Clang does when you
&is basically the same as callingRefin Julia and then using thatPtr, except maybe Clang optimizes away the copy more consistently for some reason? That'll take some more exploration, I just recall in my tests some time ago Julia often failed to optimize them away, but I didn't actually check whether a similar example in C succeeds.So the
CPUTupletype means to say that this is what we should do.That's to distinguish it from other
AbstractArrayrepresentations that aren't actually backed by memory underlying their loads, likeFillarrays, ranges, etc.