You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a commit that tests using SSE1 to compute the maximum value in a large array: juj@c62525e
SIMD support in Emscripten is still very early, but writing this down so that it can follow the development as it progresses and gets better.
Running natively on a Macbook Pro built with clang++ benchmark_sse1.cpp -O3 -o a, the results are
N: 16777216
Block scalar took 0.027743 msecs. Result: 1.000000.
Block scalar 4 unroll took 0.029196 msecs (1.052x of scalar) Result: 1.000000.
Block SSE1 no unroll took 0.004371 msecs (0.158x of scalar) Result: 1.000000.
Block SSE1 Unroll 2 took 0.005398 msecs (0.195x of scalar) Result: 1.000000.
Block SSE1 Unroll 4 took 0.004678 msecs (0.169x of scalar) Result: 1.000000.
Block SSE1 Unroll 4 pf took 0.003755 msecs (0.135x of scalar) Result: 1.000000.
Block SSE1 Unroll 16 took 0.003934 msecs (0.142x of scalar) Result: 1.000000.
Block SSE1 Unroll 16 pf took 0.004002 msecs (0.144x of scalar) Result: 1.000000.
Running in FF Nightly from today, 35.0a1 (2014-09-24), with the command line em++ -O3 tests/benchmark_sse1.cpp -o a.html -s TOTAL_MEMORY=268435456, the output is
N: 16777216
Block scalar took 0.032746 msecs. Result: 1.000000.
Block scalar 4 unroll took 0.030144 msecs (0.921x of scalar) Result: 1.000000.
Block SSE1 no unroll took 1.347598 msecs (41.153x of scalar) Result: 1.000000.
Block SSE1 Unroll 2 took 1.327801 msecs (40.548x of scalar) Result: 1.000000.
Block SSE1 Unroll 4 took 1.333699 msecs (40.728x of scalar) Result: 1.000000.
Block SSE1 Unroll 16 took 1.432141 msecs (43.734x of scalar) Result: 1.000000.
so we see that running natively, the SSE1 version is 5-7 times faster than scalar, whereas the Emscripten run is 40-43 times slower than scalar. For one, the JS version is not yet asm.js-validating, so slow performance is to be expected, but that's a good start!
The text was updated successfully, but these errors were encountered:
Testing on current incoming and Nightly from today, I'm seeing
N: 16777216
Block scalar took 0.014260 msecs. Result: 1.000000.
Block scalar 4 unroll took 0.009430 msecs (0.661x of scalar) Result: 1.000000.
Block SSE1 no unroll took 0.013255 msecs (0.930x of scalar) Result: 1.000000.
Block SSE1 Unroll 2 took 0.010540 msecs (0.739x of scalar) Result: 1.000000.
Block SSE1 Unroll 4 took 0.009025 msecs (0.633x of scalar) Result: 1.000000.
Block SSE1 Unroll 16 took 0.007760 msecs (0.544x of scalar) Result: 1.000000.
so using min & max roughly doubles performance. Much better than 40-43x from before. Running natively, I get about 7x performance with the SIMD version (and 2x compared to SIMD.js). Closing, since this is now in the correct ballpark at least.
Here's a commit that tests using SSE1 to compute the maximum value in a large array: juj@c62525e
SIMD support in Emscripten is still very early, but writing this down so that it can follow the development as it progresses and gets better.
Running natively on a Macbook Pro built with
clang++ benchmark_sse1.cpp -O3 -o a
, the results areRunning in FF Nightly from today,
35.0a1 (2014-09-24)
, with the command lineem++ -O3 tests/benchmark_sse1.cpp -o a.html -s TOTAL_MEMORY=268435456
, the output isso we see that running natively, the SSE1 version is 5-7 times faster than scalar, whereas the Emscripten run is 40-43 times slower than scalar. For one, the JS version is not yet asm.js-validating, so slow performance is to be expected, but that's a good start!
The text was updated successfully, but these errors were encountered: