Block & array hybrid storage #2738

homm · 2017-09-16T17:19:46Z

~~First of all, this is just a prototype.~~ No longer.

Memory allocation from system is slow. It can take up to 3.6x more time to access just allocated memory than access memory which is already belong to an application and was accessed earlier. By allocations from system I mean system calls to mmap or analogues for other operating systems.

C example and results

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <time.h>

#define SIZE 50 * 1024 * 1024


int main(int argc, char *argv[]) {
  int i;
  int *p;
  int sum = 0;
  clock_t start;
  float clock_alloc, clock_new, clock_used;

  for (i = 0; i < 100; i ++) {
    start = clock();
    p = (int *) calloc(1, SIZE);
    clock_alloc += (float)(clock() - start) / CLOCKS_PER_SEC;
    
    start = clock();
    memset(p, 0, SIZE);
    clock_new += (float)(clock() - start) / CLOCKS_PER_SEC;
    sum += p[512];
    
    start = clock();
    memset(p, 0, SIZE);
    clock_used += (float)(clock() - start) / CLOCKS_PER_SEC;
    sum += p[512];

    free(p);
  }
  printf("sum %d\n", sum);
  printf("alloc: %f\nnew %f\nused %f\n", clock_alloc, clock_new, clock_used);
  return 0;
}

On Ubuntu 16 with high memory:

alloc: 0.006105
new 0.513503
used 0.224562

On Ubuntu 16 with low memory (512M):

alloc: 0.021838
new 1.585902
used 0.595298

On MacOS 10.11:

alloc: 0.334999
new 0.620930
used 0.264800

Memory allocators (like libc) tries sometimes to predict memory usage patterns to avoid extra allocations from system. They could not actually return the memory to the system on free call and reuse the same memory on next malloc. For example, if reduce SIZE in the sample above to 10 Megabytes, the access to the new memory will cost the same as access to the used memory on Ubuntu. But this optimization only works in few cases:

Only with some amount of memory. For my system this optimization only works when allocated block is less than 32 Megabytes
Only with default Ubuntu allocator. This doesn't work with jemalloc or tcmalloc. This also doesn't work on Mac OS.
When no manual options for allocator are set. For example if I MALLOC_TRIM_THRESHOLD_=-1, allocator drops optimization.

Pillow allocates and frees memory a lot. Almost every operation requires to recreate the bitmap. In general this is not good. If I'd write Pillow from the scratch, I'd try to do as much as possible operations (like conversion, composition, filtering) in-place. But this is huge task which requires redesign all internal and external APIs.

So how to avoid slowdown on reallocations? Once someone told me: one of the following is true: you're allocating huge amount of memory sporadically and relatively rare and you shouldn't worry about the time spend to allocations. Or you constantly need huge amount of memory. Then just don't release it. And so here I am.

As you probably know, there are two image allocators in Pillow: ImagingAllocateBlock and ImagingAllocateArray. The first works for images < 16 Megabytes of data and allocates one large chank of memory im->linesize * im->ysize bytes. The second works for large images and allocates small pieces of memory for each line. In both cases the allocated memory size is linked with image size. This is not good for reusing memory. Additionally, this is very sharp transaction between storage types which can lead to unpredictable performance penalties. So I've reimplemented ImagingAllocateArray so it allocates chain of relatively large blocks. Every block could be used and reused to store any size of images.

As I said, this is just a prototype, which should only show opportunities of this approach. To show how it works, I'll test Pillow-SIMD with uploadcare:simd/4.3-demo branch. Tests performed using pillow-perf/testsute/ with following command line:

$ time ./run.py scale filter convert composition rotate_right -n21

Options:

master, -s2560x1638 (maximum possible ImagingAllocateBlock image)
master, -s2560x1639 (minimal possible ImagingAllocateArray image)
this, -s2560x1638, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=0
this, -s2560x1638, MEMORY_BLOCK_SIZE=4MB, MEMORY_MAX_BLOCKS=0
this, -s2560x1638, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=256

Scale
                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
  26x16 bil       2422.24 Mpx/s    2265.53 Mpx/s    2285.33 Mpx/s    2367.78 Mpx/s    2273.81 Mpx/s
  26x16 bic       1363.72 Mpx/s    1301.00 Mpx/s    1317.74 Mpx/s    1354.06 Mpx/s    1323.69 Mpx/s
  26x16 lzs        757.32 Mpx/s     733.67 Mpx/s     746.93 Mpx/s     745.34 Mpx/s     745.22 Mpx/s
  320x205 bil     1734.85 Mpx/s    1692.50 Mpx/s    1624.60 Mpx/s    1655.49 Mpx/s    1622.20 Mpx/s
  320x205 bic     1100.62 Mpx/s    1070.87 Mpx/s    1051.22 Mpx/s    1065.93 Mpx/s    1039.78 Mpx/s
  320x205 lzs      655.80 Mpx/s     626.91 Mpx/s     635.24 Mpx/s     630.84 Mpx/s     629.06 Mpx/s
  2048x1310 bil    427.71 Mpx/s     423.18 Mpx/s     285.12 Mpx/s     352.44 Mpx/s     428.02 Mpx/s
  2048x1310 bic    324.81 Mpx/s     323.97 Mpx/s     240.43 Mpx/s     283.17 Mpx/s     324.73 Mpx/s
  2048x1310 lzs    244.10 Mpx/s     242.86 Mpx/s     193.53 Mpx/s     219.76 Mpx/s     246.10 Mpx/s
  5478x3505 bil     71.60 Mpx/s      71.22 Mpx/s      71.74 Mpx/s      89.41 Mpx/s     136.74 Mpx/s
  5478x3505 bic     60.72 Mpx/s      59.95 Mpx/s      58.38 Mpx/s      68.88 Mpx/s      96.61 Mpx/s
  5478x3505 lzs     48.61 Mpx/s      48.52 Mpx/s      46.13 Mpx/s      52.46 Mpx/s      67.44 Mpx/s

Filter
                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
  Smooth           554.82 Mpx/s     548.69 Mpx/s     410.26 Mpx/s     477.15 Mpx/s     565.05 Mpx/s
  Sharpen          555.19 Mpx/s     548.69 Mpx/s     409.62 Mpx/s     476.83 Mpx/s     565.15 Mpx/s
  Smooth More      233.19 Mpx/s     248.97 Mpx/s     197.17 Mpx/s     234.82 Mpx/s     250.90 Mpx/s

Convert
                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
  RGB to L        1915.47 Mpx/s    1904.61 Mpx/s    1919.87 Mpx/s    1909.44 Mpx/s    1922.59 Mpx/s
  RGBA to LA      1423.89 Mpx/s    1340.54 Mpx/s    1454.51 Mpx/s     965.09 Mpx/s    1513.20 Mpx/s
  RGBa to RGBA    1321.21 Mpx/s    1366.67 Mpx/s     799.81 Mpx/s     976.35 Mpx/s    1478.10 Mpx/s
  RGBA to RGBa    1424.35 Mpx/s    1331.21 Mpx/s    1440.10 Mpx/s    1070.28 Mpx/s    1357.51 Mpx/s

Composition
                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
  Composition      688.91 Mpx/s     804.44 Mpx/s     441.18 Mpx/s     528.05 Mpx/s     737.72 Mpx/s

Rotate 
                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
  Flop            1351.77 Mpx/s    1351.35 Mpx/s     862.28 Mpx/s    1233.29 Mpx/s    1466.64 Mpx/s
  Flip            2024.62 Mpx/s    1709.10 Mpx/s     849.53 Mpx/s    1453.42 Mpx/s    1788.30 Mpx/s
  Rotate 90        528.13 Mpx/s     510.02 Mpx/s     405.26 Mpx/s     534.44 Mpx/s     539.80 Mpx/s
  Rotate 180      1355.31 Mpx/s    1305.54 Mpx/s     744.15 Mpx/s    1157.10 Mpx/s    1428.28 Mpx/s
  Rotate 270       519.48 Mpx/s     497.97 Mpx/s     396.42 Mpx/s     525.94 Mpx/s     530.52 Mpx/s
  Transpose        549.66 Mpx/s     517.56 Mpx/s     403.59 Mpx/s     532.69 Mpx/s     535.89 Mpx/s
  Transverse       553.93 Mpx/s     523.24 Mpx/s     402.98 Mpx/s     527.39 Mpx/s     533.90 Mpx/s

                    2560x1638        2560x1639        1M, 0 cache      4M, 0 cache      1M, 256M cache
real                8.277 s          8.449 s          9.437 s          8.106 s          6.566 s
user                6.596 s          6.700 s          6.796 s          6.748 s          6.516 s
sys                 1.676 s          1.744 s          2.636 s          1.352 s          0.044 s

So, what is going on

master, -s2560x1638
One large block, most of allocations are cached by libc. The only exception probably is the scaling to 5478x3505.

master, -s2560x1639
No significant changes, 0-10% slower due to different storage model. The only exception is Composition, which most fast in this configuration.

this, -s2560x1638, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=0
Libc doesn't cache this pattern for some reasons. System time is bigger than ever, because almost all memory is come from system.

this, -s2560x1638, MEMORY_BLOCK_SIZE=4MB, MEMORY_MAX_BLOCKS=0
It's magic, libc caches memory again. System time is even less than for master because scaling to 5478x3505 is also partially cached (which is notable from this operation time). Some operations are still slower than in master. If we want to keep libc cache, this requires further investigation.

this, -s2560x1638, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=256
All memory is cached by application. System time is near zero, almost all operation have same speed as master or even faster. Scaling to 5478x3505 is faster up to 90%! Rotate->flip is the only which noticeable slower.

Ok, than was images which originally fit in libc cache. What about large images?

Options:

master, -s5120x3200
this, -s5120x3200, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=0
this, -s5120x3200, MEMORY_BLOCK_SIZE=4MB, MEMORY_MAX_BLOCKS=0
this, -s5120x3200, MEMORY_BLOCK_SIZE=1MB, MEMORY_MAX_BLOCKS=512

Scale
                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
  51x32 bil       2340.90 Mpx/s    2319.64 Mpx/s    2357.77 Mpx/s    2284.78 Mpx/s
  51x32 bic       1337.79 Mpx/s    1352.16 Mpx/s    1328.48 Mpx/s    1330.10 Mpx/s
  51x32 lzs        838.48 Mpx/s     843.28 Mpx/s     833.11 Mpx/s     840.59 Mpx/s
  640x400 bil     1642.00 Mpx/s    1624.92 Mpx/s    1649.45 Mpx/s    1627.19 Mpx/s
  640x400 bic     1041.84 Mpx/s    1048.03 Mpx/s    1036.70 Mpx/s    1023.68 Mpx/s
  640x400 lzs      707.06 Mpx/s     702.70 Mpx/s     705.72 Mpx/s     695.98 Mpx/s
  4096x2560 bil    275.11 Mpx/s     275.63 Mpx/s     329.98 Mpx/s     427.13 Mpx/s
  4096x2560 bic    231.73 Mpx/s     236.04 Mpx/s     269.45 Mpx/s     331.88 Mpx/s
  4096x2560 lzs    195.94 Mpx/s     195.53 Mpx/s     220.44 Mpx/s     256.39 Mpx/s
  10957x6848 bil    73.54 Mpx/s      72.89 Mpx/s      88.44 Mpx/s     136.26 Mpx/s
  10957x6848 bic    63.27 Mpx/s      61.49 Mpx/s      69.56 Mpx/s      95.17 Mpx/s
  10957x6848 lzs    51.19 Mpx/s      49.63 Mpx/s      54.10 Mpx/s      68.73 Mpx/s

Filter
                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
  Smooth           378.58 Mpx/s     381.89 Mpx/s     439.52 Mpx/s     566.00 Mpx/s
  Sharpen          378.15 Mpx/s     381.88 Mpx/s     439.36 Mpx/s     566.16 Mpx/s
  Smooth More      203.46 Mpx/s     205.84 Mpx/s     220.60 Mpx/s     247.51 Mpx/s

Convert
                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
  RGB to L        1938.22 Mpx/s    1687.69 Mpx/s    1924.16 Mpx/s    1929.78 Mpx/s
  RGBA to LA      1394.24 Mpx/s    1450.94 Mpx/s    1329.35 Mpx/s    1490.40 Mpx/s
  RGBa to RGBA     706.00 Mpx/s     711.94 Mpx/s     929.12 Mpx/s    1456.63 Mpx/s
  RGBA to RGBa    1388.58 Mpx/s    1443.29 Mpx/s    1318.54 Mpx/s    1465.73 Mpx/s

Composition
                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
  Composition      447.56 Mpx/s     452.92 Mpx/s     548.22 Mpx/s     751.94 Mpx/s

Rotate 
                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
  Flop             760.20 Mpx/s     795.92 Mpx/s    1042.45 Mpx/s    1472.58 Mpx/s
  Flip             728.54 Mpx/s     737.72 Mpx/s    1037.82 Mpx/s    1776.62 Mpx/s
  Rotate 90        362.92 Mpx/s     335.44 Mpx/s     435.71 Mpx/s     476.47 Mpx/s
  Rotate 180       656.80 Mpx/s     662.20 Mpx/s     883.85 Mpx/s    1427.28 Mpx/s
  Rotate 270       355.48 Mpx/s     351.04 Mpx/s     441.78 Mpx/s     475.15 Mpx/s
  Transpose        357.53 Mpx/s     336.80 Mpx/s     431.85 Mpx/s     482.73 Mpx/s
  Transverse       355.72 Mpx/s     325.59 Mpx/s     422.69 Mpx/s     475.31 Mpx/s

                    5120x3200        1M, 0 cache      4M, 0 cache      1M, 512M cache
real               19.249 s         19.640 s         17.125 s         13.571 s
user               13.352 s         13.604 s         13.468 s         13.416 s
sys                 5.888 s          6.032 s          3.652 s          0.152 s

Sys time is about 1%. The most important thing: performance of almost all operations for large images is almost equal to performance for medium images.

So I believe this is a huge win. There are lot of thinks left to do though. There are some of them:

homm · 2017-09-16T17:37:59Z

I'm not sure that we should mention this in release notes or documentation for this release. By default the cache is turned off and this change should be totally transparent for end users. I'd like to have it as experimental feature and add public resource manager in next release (4.4).

# Conflicts: # libImaging/Imaging.h

allocate block for wider lines

homm · 2017-09-18T00:34:58Z

Finally all green and almost done

wiredfool · 2017-09-20T11:48:16Z

libImaging/Storage.c

+        block = arena->blocks[arena->blocks_cached];
+        // Reallocate if needed
+        if (block.size != requested_size){
+            block.ptr = realloc(block.ptr, requested_size);


Is this likely to be realloc'ing on every block, unless you're processing a bunch of images of identical bit width?

I believe there are three possible results of reallocation:

on the reduction: most implementations will just decrease several numbers in metadata without actually reallocation
on the expansion when space is enough: as above
on the expansion when here is no space: actual reallocation on the new address

In theory, the more time program is running the less actual relocations should occur, because all blocks will have enough space ahead after several reallocations.

According to my tests, when blocks_max is enough, only small amount of the reusings lead to actual reallocations (when the new ptr != old ptr).

The situation should improved with more intelligent block selection, not just the topmost one.

homm · 2017-09-22T20:33:32Z

I learned a lot about linux memory model this days. For example, there are two mechanisms in Linux for allocation: mmap and sbrk.

And finally I found something really interesting in libc sources.

M_MMAP_THRESHOLD
Using mmap segregates relatively large chunks of memory so that
they can be individually obtained and released from the host
system. A request serviced through mmap is never reused by any
other request (at least not directly; the system may just so
happen to remap successive requests to the same locations).

So, as discovered above, in some conditions libc doesn't use mmap for blocks up to 32 megabytes and can reuse them without returning to the system.

Still investigating.

homm · 2017-09-22T23:14:23Z

Ok, looks like I finally understood everything.

From the list of libc malloc variables there are only two significant for our case: mmap_threshold and trim_threshold.

Libc wouldn't release freed memory and would reuse it if two conditions are met:

memory was allocated with sbrk, not mmap
memory was not released (or trimmed)

mmap_threshold is a minimal size of the block which will be allocated using mmap (or, in other words, a maximum size of the block which will be allocated using sbrk)
trim_threshold is a minimum size of free memory allocated with sbrk, after which libc will actually release the memory.

By default, both variables are set to 128Kb, but as said in documentation, they are dynamically adjustable. I found only one place in the sources where this happens: in __libc_free.

So, mmap_threshold always has a size of the largest freed mmap block. And trim_threshold has double size. And this is the most important thing: once an application allocated and freed large enough block of memory (but not larger than DEFAULT_MMAP_THRESHOLD_MAX, which is 32Mb on 64-bit systems), it will not release up to two sizes of the large block and will reuse it when possible.

So, let's check this:

A. master, -s2560x1639 (array storage). As you can see, result doesn't match with original comment. This is because there are no scale tests. Indeed, the scale tests bumped trim_threshold in 2560x1639→2048x1310 scaling, because 2048x1310 had block storage.
B. master, -s2560x1639 (array storage). Ok, let's try bump trim_threshold with something else, for example 'load' operations. It always loads 2560x1600 JPEG image, so block storage is used.
C. master, -s2560x1639 (array storage). And to be sure, let's try to bump the threshold with simple python code before the tests (in pillow-perf/testsuite/cases/pillow.py):

_tmp = b'0123456789' * int(30*1024*1024*0.1)
_tmp = 0

D. master, -s2560x1639 (array storage). How about to set local environment variables manual? MALLOC_MMAP_THRESHOLD_=33554432 MALLOC_TRIM_THRESHOLD_=67108864

Convert
                      A                B                C                D
  RGB to L        1938.82 Mpx/s    1930.94 Mpx/s    1862.29 Mpx/s    1772.63 Mpx/s
  RGBA to LA      1341.36 Mpx/s    1382.56 Mpx/s    1340.54 Mpx/s    1352.18 Mpx/s
  RGBa to RGBA     749.55 Mpx/s    1360.86 Mpx/s    1364.98 Mpx/s    1359.18 Mpx/s
  RGBA to RGBa    1331.11 Mpx/s    1332.83 Mpx/s    1333.33 Mpx/s    1340.95 Mpx/s

Composition
                      A                B                C                D
  Composition      438.39 Mpx/s     805.80 Mpx/s     804.25 Mpx/s     805.95 Mpx/s

Rotate 
                      A                B                C                D
  Flop             789.88 Mpx/s    1353.53 Mpx/s    1343.92 Mpx/s    1357.92 Mpx/s
  Flip             807.68 Mpx/s    1714.76 Mpx/s    1746.24 Mpx/s    1701.50 Mpx/s
  Rotate 90        391.84 Mpx/s     508.53 Mpx/s     510.31 Mpx/s     512.75 Mpx/s
  Rotate 180       713.68 Mpx/s    1307.09 Mpx/s    1314.51 Mpx/s    1307.48 Mpx/s
  Rotate 270       379.61 Mpx/s     500.03 Mpx/s     495.14 Mpx/s     498.19 Mpx/s
  Transpose        389.07 Mpx/s     520.25 Mpx/s     519.09 Mpx/s     518.12 Mpx/s
  Transverse       397.34 Mpx/s     523.96 Mpx/s     512.26 Mpx/s     526.59 Mpx/s

                      A                B                C                D
real                2.073 s          2.569 s          1.538 s          1.495 s
user                1.516 s          2.524 s          1.508 s          1.468 s
sys                 0.556 s          0.048 s          0.032 s          0.024 s

All of three methods work for master and array storage.

to be continued...

homm · 2017-09-22T23:50:11Z

What about block & array hybrid storage. I choose the block size 1Mb because I think it is large enough to avoid too frequent malloc/free calls and small enough to be allocated even with high memory fragmentation.

C and D cases also bump trim_threshold and fix performance for hybrid storage. But no other operations bumps threshold anymore, because largest freed block for now is largest pillow's block and it is 1Mb.

Possible solutions I see:

Leave it as is. Indeed, the problem only related to single memory manger and its defaults. We are adding something like our own configurable memory manager with more explicit tools to improve performance.
Increase default block size to old value: 16Mb. This will bump trim_threshold for large images automatically, as before. This will improve compatibility, but user will have to change PILLOW_BLOCK_SIZE to smallest values in the most cases if he decides to manage memory manually.

We shouldn't worry about memory fragmentation, because I'm going to implement retry with smallest possible block size (one page, 4Kb) if large block allocation is failed (very like it was before, when array storage was used when block storage was failed).

# Conflicts: # docs/releasenotes/4.3.0.rst

increase default block size to previous value

radarhere · 2017-09-23T01:47:23Z

docs/releasenotes/4.3.0.rst

-have been removed.
-
-The ``PIL.Image.core.getcount`` methods have been removed, use
-``PIL.Image.core.get_stats()['new_count']`` property instead.


You've accidentally removed this chunk of the release notes.

No, it was moved to Core Image API Changes section:
https://github.com/python-pillow/Pillow/pull/2738/files#diff-28d4b8f87f5b7e23cf7f62b2e65d2d4d

Okay, thanks, my mistake

take alignment into account when calculating lines_per_block

homm · 2017-09-23T03:12:24Z

I finally happy with this. In its current form the PR doesn't introduce any known performance regressions and also provide reach API on all levels for memory management. It is even a bit faster for large images (like 5120x3200).

wiredfool · 2017-09-28T16:05:00Z

libImaging/Storage.c

-            break;
+        if (line_in_block == 0) {
+            int required;
+            int lines_remained = lines_per_block;


I think this is better named lines_remaining, or lines_requested or current_block_lines

wiredfool · 2017-09-28T16:13:43Z

So the invariants here are:

Images don't require any more storage then they previously did, except for potentially a handful of bytes due to alignment issues.
Large images are allocating in blocks that are similarly (order of magnitude) sized to the previous chunk size, where pre-patch they would be made of one malloc per line.
Small images will still go in one chunk.
The only case where this will use more memory is that it will hold on to a bunch of memory from recently freed images. The high water mark should be the same.

homm · 2017-09-30T01:27:35Z

except for potentially a handful of bytes due to alignment issues.

If you mean arena->alignment, it is 1 by default and doesn't affect any memory pointers until user explicitly set it.

it will hold on to a bunch of memory from recently freed images

I believe for long-living applications where a big enough image (4Mpx) was created at least once, memory consumption will be exactly the same including held on memory.

homm · 2017-09-30T01:34:00Z

For some reason one of the docker's builds sometimes fails.

hugovk · 2017-09-30T04:05:23Z

We're tracking the Docker problem in #2758.

radarhere · 2017-09-30T04:09:05Z

libImaging/Imaging.h

+    int alignment;        /* Alignment in memory of each line of an image */
+    int block_size;       /* Preferred block size */
+    int blocks_max;       /* Maximum number of cached blocks */
+    int blocks_cached;    /* Current number of block not accociated with images */


blocks not associated

Thanks! Fixed

radarhere · 2017-09-30T04:15:01Z

libImaging/Imaging.h

+    int stats_new_count;           /* Number of new allocated images */
+    int stats_allocated_blocks;    /* Number of allocated blocks */
+    int stats_reused_blocks;       /* Number of blocks which was retrieved from pool */
+    int stats_reallocated_blocks;  /* Number of blocks which was actually reallocated after retrieving */


fix comments

wiredfool · 2017-10-01T19:40:42Z

I think that this should probably be called out in the release notes, if only to say that large images are now allocated in a block manner. We don't think that there's a performance regression, but we don't really test everywhere that Pillow is used, especially on small memory devices.

I'd also like to include some documentation, at least as it relates to the environment variables. And, I think that we should include a bunch of this thread in an area of the docs for design decisions. Promoting them will help them be useful to future hackers and prevent them from being buried in github threads.

homm · 2017-10-01T19:43:31Z

@wiredfool That is fair enough. Do you need any help from me?

wiredfool · 2017-10-01T19:59:50Z

Don't think so. The block size is still the same as the old threshold to go to line by line, right?

homm · 2017-10-01T20:01:45Z

Yes. The previous THRESHOLD was 2048*2048*4L which is 16*1024*1024, same as the new default block_size.

homm added 5 commits September 15, 2017 17:32

Add params check for ImagingNewBlock

852124d

work in ImagingAllocateArray with blocks

0a3c852

save released blocks

f584f83

Do not use ImagingNewBlock at all

fe283b1

MEMORY_MAX_BLOCKS should be 0 by default

883fb8f

homm added the Do Not Merge label Sep 16, 2017

homm changed the title ~~[WIP] Block storage~~ [WIP] Block & array hybrid storage Sep 16, 2017

homm mentioned this pull request Sep 16, 2017

Release Pillow 4.3.0 on October 1, 2017 #2664

Closed

homm added 4 commits September 16, 2017 22:21

Merge branch 'master' into block-storage

63b925d

# Conflicts: # libImaging/Imaging.h

align lines

d4a1f7a

limit allocated memory to lines_per_block * linesize size

f2123b4

allocate block for wider lines

do not request more lines than required

a5034b5

homm added this to the 4.3.0 milestone Sep 16, 2017

homm added 13 commits September 17, 2017 01:40

debug messages

6007e81

fix zero size images

0054743

Fix 0-width and 0-height images other way

fd907fb

Return blocks in reverse order to reduce reallocations

c8a2923

temp

dc192be

ImagingMemoryArean tata type

4951962

stats

8659cd7

fix visual c compiler

53dde3b

python api for resources

af3dcf8

tests

6d2be87

collect garbage before check memory

ae104b0

test for images wider than block_size

2352777

actually fix tests on pypy

2ab19bb

homm changed the title ~~[WIP] Block & array hybrid storage~~ Block & array hybrid storage Sep 18, 2017

homm removed the Do Not Merge label Sep 18, 2017

wiredfool reviewed Sep 20, 2017

View reviewed changes

radarhere added the Needs Rebase label Sep 21, 2017

homm added 4 commits September 23, 2017 04:07

Merge branch 'master' into block-storage

b51d77e

# Conflicts: # docs/releasenotes/4.3.0.rst

fallback to small block size on memory error

6a43579

increase default block size to previous value

clear_cache: number or blocks to keep

4b85230

fix alignment

145180a

homm removed the Needs Rebase label Sep 23, 2017

radarhere reviewed Sep 23, 2017

View reviewed changes

homm added 2 commits September 23, 2017 04:56

make MS C compiler happy

3859639

revert reverse order of blocks associated with the image

9db1b76

take alignment into account when calculating lines_per_block

homm requested a review from wiredfool September 23, 2017 03:12

wiredfool reviewed Sep 28, 2017

View reviewed changes

rename lines_remaining

8353097

radarhere reviewed Sep 30, 2017

View reviewed changes

rename arena->blocks to arena->blocks_pool

5a1cdfc

fix comments

wiredfool merged commit c82f9fe into python-pillow:master Oct 1, 2017

homm deleted the block-storage branch October 1, 2017 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block & array hybrid storage #2738

Block & array hybrid storage #2738

homm commented Sep 16, 2017 •

edited

Loading

homm commented Sep 16, 2017

homm commented Sep 18, 2017

wiredfool Sep 20, 2017

homm Sep 20, 2017 •

edited

Loading

homm commented Sep 22, 2017

homm commented Sep 22, 2017 •

edited

Loading

homm commented Sep 22, 2017 •

edited

Loading

radarhere Sep 23, 2017

homm Sep 23, 2017

radarhere Sep 23, 2017

homm commented Sep 23, 2017

wiredfool Sep 28, 2017 •

edited

Loading

homm Sep 30, 2017

wiredfool commented Sep 28, 2017

homm commented Sep 30, 2017 •

edited

Loading

homm commented Sep 30, 2017

hugovk commented Sep 30, 2017

radarhere Sep 30, 2017 •

edited

Loading

homm Sep 30, 2017

radarhere Sep 30, 2017

wiredfool commented Oct 1, 2017

homm commented Oct 1, 2017

wiredfool commented Oct 1, 2017

homm commented Oct 1, 2017 •

edited

Loading

Block & array hybrid storage #2738

Block & array hybrid storage #2738

Conversation

homm commented Sep 16, 2017 • edited Loading

homm commented Sep 16, 2017

homm commented Sep 18, 2017

wiredfool Sep 20, 2017

Choose a reason for hiding this comment

homm Sep 20, 2017 • edited Loading

Choose a reason for hiding this comment

homm commented Sep 22, 2017

homm commented Sep 22, 2017 • edited Loading

homm commented Sep 22, 2017 • edited Loading

radarhere Sep 23, 2017

Choose a reason for hiding this comment

homm Sep 23, 2017

Choose a reason for hiding this comment

radarhere Sep 23, 2017

Choose a reason for hiding this comment

homm commented Sep 23, 2017

wiredfool Sep 28, 2017 • edited Loading

Choose a reason for hiding this comment

homm Sep 30, 2017

Choose a reason for hiding this comment

wiredfool commented Sep 28, 2017

homm commented Sep 30, 2017 • edited Loading

homm commented Sep 30, 2017

hugovk commented Sep 30, 2017

radarhere Sep 30, 2017 • edited Loading

Choose a reason for hiding this comment

homm Sep 30, 2017

Choose a reason for hiding this comment

radarhere Sep 30, 2017

Choose a reason for hiding this comment

wiredfool commented Oct 1, 2017

homm commented Oct 1, 2017

wiredfool commented Oct 1, 2017

homm commented Oct 1, 2017 • edited Loading

homm commented Sep 16, 2017 •

edited

Loading

homm Sep 20, 2017 •

edited

Loading

homm commented Sep 22, 2017 •

edited

Loading

homm commented Sep 22, 2017 •

edited

Loading

wiredfool Sep 28, 2017 •

edited

Loading

homm commented Sep 30, 2017 •

edited

Loading

radarhere Sep 30, 2017 •

edited

Loading

homm commented Oct 1, 2017 •

edited

Loading