Skip to content

Commit

Permalink
fix list (+6 squashed commits)
Browse files Browse the repository at this point in the history
Squashed commits:
[c45b871] update for Pillow-SIMD 3.4.0
[bedd83f] no alpha compositing in this release
[e8fe730] update results for latest version
add Skia results
[a16ff97] add SIMD changes
[82ffbd6] fix readme (+4 squashed commits)
Squashed commits:
[85677f9] fix error
[f44ebb1] update results for unrolled implementation
[83968c3] fix #4
[cd73c51] update link (+11 squashed commits)
Squashed commits:
[5882178] correct spelling
[a0e5956] Why Pillow-SIMD is even faster
[108e72e] Why Pillow itself is so fast
[e8eeda1] spelling fixes
[e816e9c] spelling
[d2eefef] methodology, why not contributed
[2e55786] installation and conclusion
[9f6415e] more info
[67e55b7] more benchmarks
test files
[471d4c5] remove spaces
[904d89d] add performance tests
[4fe17fe] simple readme
  • Loading branch information
homm committed Feb 26, 2017
1 parent 3f6db91 commit ff9b2c0
Show file tree
Hide file tree
Showing 3 changed files with 251 additions and 1 deletion.
67 changes: 67 additions & 0 deletions CHANGES.SIMD.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
Changelog (Pillow-SIMD)
=======================

3.3.0.post1
-----------

Alpha compositing
~~~~~~~~~~~~~~~~~

- SSE4 and AVX2 fixed-point full loading implementation.
Up to 4.6x faster.

3.3.0.post0
-----------

Resampling
~~~~~~~~~~

- SSE4 and AVX2 fixed-point full loading horizontal pass.
- SSE4 and AVX2 fixed-point full loading vertical pass.

Convertion
~~~~~~~~~~

- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations.
Up to 2.6x faster.
- RGBa -> RGBA AVX2 implementation using gather instructions.
Up to 5x faster.


3.2.0.post3
-----------

Resampling
~~~~~~~~~~

- SSE4 and AVX2 float full loading horizontal pass.
- SSE4 float full loading vertical pass.


3.2.0.post2
-----------

Resampling
~~~~~~~~~~

- SSE4 and AVX2 float full loading horizontal pass.
- SSE4 float per-pixel loading vertical pass.


2.9.0.post1
-----------

Resampling
~~~~~~~~~~

- SSE4 and AVX2 float per-pixel loading horizontal pass.
- SSE4 float per-pixel loading vertical pass.
- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling.
- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling.


Box blur
~~~~~~~~

- Simple SSE4 fixed-point implementations with per-pixel loading.
- Up to 2.1x faster.
183 changes: 183 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Pillow-SIMD

Pillow-SIMD is "following" Pillow fork (which is PIL fork itself).

For more information about original Pillow, please
[read the documentation][original-docs],
[check the changelog][original-changelog] and
[find out how to contribute][original-contribute].


## Why SIMD

There are many ways to improve the performance of image processing.
You can use better algorithms for the same task, you can make better
implementation for current algorithms, or you can use more processing unit
resources. It is perfect when you can just use more efficient algorithm like
when gaussian blur based on convolutions [was replaced][gaussian-blur-changes]
by sequential box filters. But a number of such improvements are very limited.
It is also very tempting to use more processor unit resources
(via parallelization) when they are available. But it is handier just
to make things faster on the same resources. And that is where SIMD works better.

SIMD stands for "single instruction, multiple data". This is a way to perform
same operations against the huge amount of homogeneous data.
Modern CPU have different SIMD instructions sets like
MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.

Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default)
and AVX2 support.


## Status

[![Uploadcare][uploadcare.logo]][uploadcare.com]

Pillow-SIMD can be used in production. Pillow-SIMD has been operating on
[Uploadcare][uploadcare.com] servers for more than 1 year.
Uploadcare is SAAS for image storing and processing in the cloud
and the main sponsor of Pillow-SIMD project.

Currently, following operations are accelerated:

- Resize (convolution-based resampling): SSE4, AVX2
- Gaussian and box blur: SSE4
- Alpha composition: SSE4, AVX2
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
- RGBa → RGBA (division by alpha): AVX2

See [CHANGES](CHANGES.SIMD.rst).


## Benchmarks

The numbers in the table represent processed megapixels of source RGB 2560x1600
image per second. For example, if resize of 2560x1600 image is done
in 0.5 seconds, the result will be 8.2 Mpx/s.

- Skia 53
- ImageMagick 6.9.3-8 Q8 x86_64
- Pillow 3.3.0
- Pillow-SIMD 3.3.0.post1

Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
------------------------|---------|------|-------|----------|----------|--------
**Resize to 16x16** | Bilinear| 41.37| 337.12| 571.67| 903.40| 809.49
| Bicubic | 20.58| 185.79| 305.72| 552.85| 453.10
| Lanczos | 14.17| 113.27| 189.19| 355.40| 292.57
**Resize to 320x180** | Bilinear| 29.46| 209.06| 366.33| 558.57| 592.76
| Bicubic | 15.75| 124.43| 224.91| 353.53| 327.68
| Lanczos | 10.80| 82.25| 153.10| 244.22| 196.92
**Resize to 1920x1200** | Bilinear| 17.80| 55.87| 131.27| 152.11| 192.30
| Bicubic | 9.99| 43.64| 90.20| 112.34| 112.84
| Lanczos | 6.95| 34.51| 72.55| 103.16| 104.76
**Resize to 7712x4352** | Bilinear| 2.54| 6.71| 16.06| 20.33| 20.58
| Bicubic | 1.60| 5.51| 12.65| 16.46| 16.52
| Lanczos | 1.09| 4.62| 9.84| 13.38| 12.05
**Blur** | 1px | 6.60| 16.94| 35.16| |
| 10px | 2.28| 16.94| 35.47| |
| 100px | 0.34| 16.93| 35.53| |


### Some conclusion

Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
than Pillow in 2—2.5 times. In general, Pillow-SIMD with AVX2 always
**8-20 times faster** than ImageMagick and almost equal to the Skia results,
high-speed graphics library used in Chromium.

### Methodology

All tests were performed on Ubuntu 14.04 64-bit running on
Intel Core i5 4258U with AVX2 CPU on the single thread.

ImageMagick performance was measured with command-line tool `convert` with
`-verbose` and `-bench` arguments. I use command line because
I need to test the latest version and this is the easiest way to do that.

All operations produce exactly the same results.
Resizing filters compliance:

- PIL.Image.BILINEAR == Triangle
- PIL.Image.BICUBIC == Catrom
- PIL.Image.LANCZOS == Lanczos

In ImageMagick, the radius of gaussian blur is called sigma and the second
parameter is called radius. In fact, there should not be additional parameters
for *gaussian blur*, because if the radius is too small, this is *not*
gaussian blur anymore. And if the radius is big this does not give any
advantages but makes operation slower. For the test, I set the radius
to sigma × 2.5.

Following script was used for testing:
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63


## Why Pillow itself is so fast

There are no cheats. High-quality resize and blur methods are used for all
benchmarks. Results are almost pixel-perfect. The difference is only effective
algorithms. Resampling in Pillow was rewritten in version 2.7 with
minimal usage of floating point numbers, precomputed coefficients and
cache-awareness transposition.


## Why Pillow-SIMD is even faster

Because of SIMD, of course. There are some ideas how to achieve even better
performance.

- **Efficient work with memory** Currently, each pixel is read from
memory to the SSE register, while every SSE register can handle
four pixels at once.
- **Integer-based arithmetic** Experiments show that integer-based arithmetic
does not affect the quality and increases the performance of non-SIMD code
up to 50%.
- **Aligned pixels allocation** Well-known that the SIMD load and store
commands work better with aligned memory.


## Why do not contribute SIMD to the original Pillow

Well, it's not that simple. First of all, Pillow supports a large number
of architectures, not only x86. But even for x86 platforms, Pillow is often
distributed via precompiled binaries. To integrate SIMD in precompiled binaries
we need to do runtime checks of CPU capabilities.
To compile the code with runtime checks we need to pass `-mavx2` option
to the compiler. However this automatically activates all `if (__AVX2__)`
and below conditions. And SIMD instructions under such conditions exist
even in standard C library and they do not have any runtime checks.
Currently, I don't know how to allow SIMD instructions in the code
but *do not allow* such instructions without runtime checks.


## Installation

In general, you need to do `pip install pillow-simd` as always and if you
are using SSE4-capable CPU everything should run smoothly.
Do not forget to remove original Pillow package first.

If you want the AVX2-enabled version, you need to pass the additional flag to C
compiler. The easiest way to do that is define `CC` variable while compilation.

```bash
$ pip uninstall pillow
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
```


## Contributing to Pillow-SIMD

Pillow-SIMD and Pillow are two separate projects.
Please submit bugs and improvements not related to SIMD to
[original Pillow][original-issues]. All bugs and fixes in Pillow
will appear in next Pillow-SIMD version automatically.


[original-docs]: http://pillow.readthedocs.io/
[original-issues]: https://github.com/python-pillow/Pillow/issues/new
[original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst
[original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md
[gaussian-blur-changes]: http://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask
[uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd
[uploadcare.logo]: https://ucarecdn.com/dc4b8363-e89f-402f-8ea8-ce606664069c/-/preview/
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,7 @@ def debug_build():
setup(name=NAME,
version=PILLOW_VERSION,
description='Python Imaging Library (Fork)',
long_description=_read('README.rst').decode('utf-8'),
long_description=_read('README.md').decode('utf-8'),
author='Alex Clark (Fork Author)',
author_email='aclark@aclark.net',
url='http://python-pillow.org',
Expand Down

0 comments on commit ff9b2c0

Please sign in to comment.