Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dominant color method #4874

Closed
wants to merge 10 commits into from
Closed

Add dominant color method #4874

wants to merge 10 commits into from

Conversation

392781
Copy link

@392781 392781 commented Aug 18, 2020

Fixes #4634.

Changes proposed in this pull request:

  • Adds a method to determine the dominant colors in an image using k-means clustering

I added some very basic tests but seeing as I'm very new at open source contributing I am open to suggestions to improve them.

src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
392781 and others added 3 commits August 19, 2020 08:28
Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>
Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>
Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>
@392781
Copy link
Author

392781 commented Aug 19, 2020

One thing to note about this is that k-means clustering takes a few seconds to compute for larger images which is bothersome. I'm currently working on improving the speed of the algorithm.

One catch-all method of doing this is to rescale the image for faster results. This solution is simple and doesn't require a lot of bloated code on top of the algorithm. The cost is that there is a loss of pixel information through rescaling so the dominant colors will come out different than original image.

Another is to do a bit of math on the color spaces used and throw out some less necessary pixels (kind of how Color-Thief does it on the RGBA space. The benefit is avoiding some more computation. The cost is readability of the method and the need to cover all supported color spaces. One way to sidestep this is to convert to a more familiar color space such as HSV, RGB, or RGBA; however, I'm not that knowledgable about these spaces and the effects of converting between them (however, it seems colorthief does it so I'm not sure.)

I'm hesitant to implement option 2 because I do not want this method to become too bloated or more complicated than it already is... But these are some ideas to think about. For now I'll test out resizing as an option and maybe play around with converting modes.

EDIT: I decided to implement the option to rescale since excluding colors yielded mixed results.

@hugovk
Copy link
Member

hugovk commented Aug 20, 2020

The Image module is pretty big and I wonder if this would be better suited in another one?


def test_getdominantcolors():
def getdominantcolors(mode):
im = hopper(mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about it being slow, pytest Tests/test_image_getdominantcolors.py takes ~5.67s on my Mac.

Resizing to (10, 10) reduces it to 0.23s.
Resizing to (20, 20) reduces it to 0.41s.

Do you think 10x10 might be too small to be useful?

Suggested change
im = hopper(mode)
im = hopper(mode)
im.thumbnail((10, 10))

Copy link
Author

@392781 392781 Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small sizes like that give a very rough approximation especially if you're looking for a palette that's more than 3 colors. The colors returned tend to be much darker. I found that (100,100) gives somewhat decent results. The issue becomes exacerbated with large images (1080p wallpapers)

assert getdominantcolors("YCbCr") == 3
assert getdominantcolors("CMYK") == 3
assert getdominantcolors("RGBA") == 3
assert getdominantcolors("HSV") == 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve these tests so they're testing more than just they return a thing of length three.

We can also use @pytest.mark.parametrize so each test case is run independently so cannot depend on an earlier case.

Here's an example:

import pytest

from .helper import hopper


@pytest.mark.parametrize(
    "test_mode, expected",
    [
        ("F", [28.8104386146252, 66.26185757773263, 127.53211228743844]),
        ("I", [136, 94, 40]),
        ("L", [25, 63, 127]),
        ("P", [11, 71, 159]),
        ("RGB", [(172, 117, 94), (53, 44, 55), (95, 127, 185)]),
        ("YCbCr", [(130, 108, 155), (123, 163, 105), (47, 131, 131)]),
        ("CMYK", [(201, 210, 199, 0), (159, 127, 69, 0), (82, 137, 160, 0)]),
        ("RGBA", [(31, 24, 37, 255), (129, 125, 150, 255), (87, 67, 72, 255)]),
        ("HSV", [(140, 131, 85), (177, 70, 46), (97, 131, 186)]),
    ],
)
def test_getdominantcolors(test_mode, expected):
    def getdominantcolors(mode):
        im = hopper(mode)
        im.thumbnail((10, 10))
        colors = im.getdominantcolors()
        return colors

    assert getdominantcolors(test_mode) == expected

What do you think?

Do those expected return values look right?

Might they be different on other systems? Especially the first one. If so, we can adjust it somewhat, possibly have a special case for that.

It would be good to add further test_X functions to verify the other parameters of im.getdominantcolors, plus warnings and exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my earlier comment never posted. The values for HSV are way off, straight up wrong even when I tested yesterday. I think the way I implemented the algorithm doesn't play nice with the way I approximate the centers of that color space.

In addition when I convert CMYK and YCbCr to RGB they tend to be off by a few color values... This isn't too severe of a problem though.

One solution to the HSV case is to simply convert to RGBA and do the calculations there... I'm going to see if I could fix what's happening before I divert to this option.

My 2nd concern is that the returned colors are mostly subjective to the viewer. If say we used some other implementation as a base case, k-means may start out randomized and return slightly different results. Perhaps I could test within a range of accepted values?

@392781
Copy link
Author

392781 commented Aug 20, 2020

The Image module is pretty big and I wonder if this would be better suited in another one?

* [`ImageColor`](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html)

* [`ImageOps`](https://pillow.readthedocs.io/en/stable/reference/ImageOps.html)

* [`ImageStat`](https://pillow.readthedocs.io/en/stable/reference/ImageStat.html)

I've been thinking the same thing. I think if it were placed anywhere else it may be in ImageOps... ImageColor doesn't feel like a great fit for what this does unless maybe it gives an option to return colors in string formats.

@nulano
Copy link
Contributor

nulano commented Aug 20, 2020

I'm not sure ImageOps is the right place. It is a set of functions that return a processed variation of the source image, not properties of an image.

The ImageOps module contains a number of ‘ready-made’ image processing operations.


ImageColor doesn't seem right either:

The ImageColor module contains color tables and converters from CSS3-style color specifiers to RGB tuples. This module is used by PIL.Image.new() and the ImageDraw module, among others.


ImageStat looks like the closest match, but the PIL.ImageStat.Stat class caches results and the proposed function takes parameters. Maybe make this a module function in ImageStat (e.g. PIL.ImageStat.dominant_colors(im, ...))?

The ImageStat module calculates global statistics for an image, or for a region of an image.

src/PIL/Image.py Outdated Show resolved Hide resolved
Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>
src/PIL/Image.py Outdated Show resolved Hide resolved
@392781 392781 closed this by deleting the head repository Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Identify dominant colors
4 participants