-
-
Notifications
You must be signed in to change notification settings - Fork 13
Applications #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Happy to help give pointers if you want to hack on any of these things. Something in the ONNX/FluxJS/deployment bucket would be easy to get started with. WebAssembly.jl is solid and would probably make Mjolnir->WASM quite easy. |
That all sounds quite excellent, I'm very excited about this work and what it bodes for me being able to use Julia at work :) To start, I'd like to explore working on emitting code for resource constrained systems. My initial inclination is that it would be initially easiest to go for targeting tensorflow lite which does things like quantization etc, potentially even TF lite for microcontrollers to get at even lighter targets like https://www.youtube.com/watch?v=HzCRZsGJLbI . Another possible target is : https://github.com/google/iree Though, to what extent it would be a good idea to skip all that just work on emitting slim c code? Especially because I'm not sure yet if TF lite for microcontrollers allows use of custom ops. I'm going to have to do a bit more digging to sharpen this, but these are my initial thoughts. Edit: I don't want to get ahead of myself though. Perhaps just focusing on basic TF lite for now would be best, though I'd need to be able to integrate custom ops. Another question I need to explore is at what point in the stack does quantization need to happen: https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html |
It's a little clumsy right now, but here's how you can get a graph for the forward pass of a simple model, ready to deploy: (xla-test) pkg> add Flux https://github.com/MikeInnes/Mjolnir.jl https://github.com/MikeInnes/XLATools.jl#next
julia> using Flux, XLA
julia> m = Chain(Dense(10, 5, relu), Dense(5, 2));
julia> XLA.@trace XLA.Primitives() m(Vector{Float32})
1: (%1 :: const(Chain(Dense(10, 5, relu), Dense(5, 2))), %2 :: Array{Float32,1})
%3 = Float32[-0.50585645 -0.20598492 … -0.1412567 0.15082987; -0.0841699 -0.57924235 … -0.3025245 -0.27678147; … ; -0.16991931 -0.6295842 … -0.13748969 -0.32836327; 0.018975155 -0.22297584 … 0.1435846 0.5270162] :: const(Float32[-0.50585645 -0.20598492 … -0.1412567 0.15082987; -0.0841699 -0.57924235 … -0.3025245 -0.27678147; … ; -0.16991931 -0.6295842 … -0.13748969 -0.32836327; 0.018975155 -0.22297584 … 0.1435846 0.5270162])
%4 = Float32[0.0, 0.0, 0.0, 0.0, 0.0] :: const(Float32[0.0, 0.0, 0.0, 0.0, 0.0])
%5 = (*)(%3, %2) :: Array{Float32,1}
%6 = (Base.Broadcast.broadcasted)(+, %5, %4) :: Array{Float32,1}
%7 = (Base.Broadcast.broadcasted)(NNlib.relu, %6) :: Array{Float32,1}
%8 = Float32[-0.078295 0.9035908 … 0.76721174 0.37824208; 0.08101376 0.5027532 … 0.39849186 0.39398715] :: const(Float32[-0.078295 0.9035908 … 0.76721174 0.37824208; 0.08101376 0.5027532 … 0.39849186 0.39398715])
%9 = Float32[0.0, 0.0] :: const(Float32[0.0, 0.0])
%10 = (*)(%8, %7) :: Array{Float32,1}
%11 = (Base.Broadcast.broadcasted)(+, %10, %9) :: Array{Float32,1}
return %11 Turning this into a graph for whatever framework, or even C code, should be pretty straightforward.
I think this could be a nice approach; the main potential problem is that we support broadcasting/mapping arbitrary functions. That's hard to do in C but might be possible in a templated C++ library like Eigen. XLA can do it too, so perhaps TF lite can. The other option is to only support built-in activation functions. Theoretically, I think you may even be able to just get XLA to dump object code, but I've no idea how hard that is in practice.
This is a good question that I'm not sure of either. AIUI you can potentially do quantisation (and similar things like weight pruning) before training or after it, as a deployment optimisation. It feels like that could be a fairly straightforward API in Flux (basically an |
Hello Mike,
In the spirit of your readme, I'm wondering to what extent this package can or is intended to address some common pain points aside from speeding up flux/zygote:
For those that apply, are they planned roadmap items, and if not, how much additional work would they required?
Thanks
The text was updated successfully, but these errors were encountered: