WIP: CUDA performances

Currently, the experiments with CUDA yield terrible performances on AWS/g2.
It may be because, calculations are made on 64 bits by default.
To do :
- test 32 bits calculations
- test 64 bits on a card which supports it natively (e.g. gtx titan)
- try guvectorize(...target='gpu')