Support for SIMD WebAssembly #137

Open

Open

Support for SIMD WebAssembly#137

Labels

opened

on Jul 14, 2017

Interesting project you have here!

Is there any speed comparisons against WebAssembly anywhere? eg rewrite this for gpu.js
http://kripken.github.io/Massive/beta/

I used to hand-code SSE ASM for DSP back in the day, so I'm always looking to save a few cycles ;)

robertleeplummerjr

Member

I would love to! How would one go about doing this?

robertleeplummerjr

Member

Currently what we'd like to do is implement an accelerator for cpu. Right now there is overhead for creating a cpu in the form of loops and callbacks for each item in the arrays. Our current goal would be to unroll these loops where possible, and stick the kernel function body there, rather than a callback. At the very least this would prevent the looping and callback/closure cost, but there is a limit to the size of these functions and on this scale it can escalate quickly. A "small" 512*512 matrix, for example, has 262,144 kernel calls.

How does WebAssembly deal with this type of problem? Is this the right question to be asking?

Contributor

@robertleeplummerjr, @tomByrer : Fuzz and I were discussing of doing this after v1. The SIMD aspect to be exact. Though we probably, would run it as a seperate mode (not CPU mode)

Mainly cause it will make for a hilarious tag line, GPU.JS, now transpiling from CPU to CPU!

robertleeplummerjr

Member

I, for one, would be in favor of the "CPU to CPU" tagline, it'd at first be funny, then they'd see the numbers. Their reaction: "Hahaha, what a funny joke {clicks link}... oooOOOooo!"

(But I'll do whatever you leaders feel is important 😛 )

Member

Will leave this here so you guys can salivate at the CPU performance gains of SIMD:

Also a working CPU SIMD demo here:

http://peterjensen.github.io/idf2014-simd/idf2014-simd.html

This is not forgetting that we are technically close to SIMD on GPU at the moment:

The beginning: 1 gpu thread, 1 output value
Float textures: 1 gpu thread, 4 output values <- we are currently here
Branch-less optimizer: squash if branches e.g:

if (x > 0) {
    y += 5;
}

// becomes
z = x > 0;
y += 5 * z;

SIMD optimizer:

result.r = a[0] + b[0];
result.g = a[1] + b[1];
result.b = a[2] + b[2];
result.a = a[3] + b[3];

// becomes
result = a + b;

changed the title ~~[-]perf test examples against WebAssembly[/-]~~ Support for SIMD WebAssembly

on Jul 16, 2017

mentioned this

on Jul 16, 2017

Should cpu fallback be run in parallel? #140

robertleeplummerjr

added

on Dec 23, 2017

Any speed comparisons against WebAssembly?

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Participants