Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[thumbnails] thumbnails for dashboards and charts #8947

Merged
merged 37 commits into from
Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
48bed12
[thumbnails] New, thumbnail computation for charts and dashboards
dpgaspar Jan 10, 2020
420470e
[thumbnails] Initial working API with cache
dpgaspar Jan 10, 2020
4f153fe
[thumbnails] type annotation and API
dpgaspar Jan 13, 2020
b1c8863
[thumbnails] Add Pillow dependency
dpgaspar Jan 13, 2020
5952e2c
[thumbnails] More type annotations and lint
dpgaspar Jan 13, 2020
b19f17b
[thumbnails] More type annotations and lint
dpgaspar Jan 14, 2020
7fd2412
Merge master and fix
dpgaspar Jan 20, 2020
c311cd1
[thumbnails] Lint and tweaks
dpgaspar Jan 20, 2020
520cff9
[thumbnails] more Lint
dpgaspar Jan 20, 2020
71d7a94
[thumbnails] isort
dpgaspar Jan 20, 2020
46f99a7
[thumbnails] isort
dpgaspar Jan 20, 2020
0bf9061
[thumbnails] selenium with configurable auth function
dpgaspar Jan 21, 2020
b470e80
[thumbnails] refactor
dpgaspar Jan 21, 2020
406d978
[thumbnails] Refactor and started tests
dpgaspar Jan 21, 2020
c1700dc
Merge and conflicts
dpgaspar Jan 24, 2020
409dd51
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Jan 27, 2020
70d28d3
Thumbnails behind a feature flag
dpgaspar Jan 27, 2020
b2adbcf
Make REST compute async when no cache key found
dpgaspar Jan 27, 2020
9f27fda
lint
dpgaspar Jan 27, 2020
8c80aee
Tests still not working
dpgaspar Jan 27, 2020
bef6a4f
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Feb 25, 2020
eb70a8c
[thumbnails] Make pillow optional, lint and tests
dpgaspar Feb 25, 2020
0eae428
[thumbnails] improved isolations of the feature flag
dpgaspar Feb 26, 2020
01cfb5b
[thumbnails] tests and fixes
dpgaspar Feb 27, 2020
e246e8c
[thumbnails] Update docs
dpgaspar Feb 28, 2020
429bff2
[thumbnails] address comments
dpgaspar Mar 12, 2020
b9d0cd7
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Mar 17, 2020
acf675d
Merge remote-tracking branch 'upstream/master' into refactor/api-char…
dpgaspar Mar 23, 2020
4f8cd83
[thumbnails] Docs use S3 cache
dpgaspar Mar 23, 2020
dfd7045
[thumbnails] Added thumbs URL to GET HTTP endpoint (show)
dpgaspar Mar 23, 2020
3bd3626
[thumbnails] Fix, tests
dpgaspar Mar 23, 2020
6b8e511
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Mar 27, 2020
52e1f6a
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Apr 9, 2020
86ba9db
change 304 to 302 and address comments
dpgaspar Apr 9, 2020
823fe55
Fix lint
dpgaspar Apr 13, 2020
e44571c
ignore new test config
dpgaspar Apr 13, 2020
ebeb150
Merge remote-tracking branch 'upstream/master' into feature/thumbnail…
dpgaspar Apr 14, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,74 @@ section in `config.py`:
This will cache all the charts in the top 5 most popular dashboards every hour.
For other strategies, check the `superset/tasks/cache.py` file.

Caching Thumbnails
------------------

This is an optional feature that can be turned on by activating it's feature flag on config:

.. code-block:: python

FEATURE_FLAGS = {
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
}


For this feature you will need a cache system and celery workers. All thumbnails are store on cache and are processed
asynchronously by the workers.

An example config where images are stored on S3 could be:

.. code-block:: python

from flask import Flask
from s3cache.s3cache import S3Cache

...

class CeleryConfig(object):
BROKER_URL = "redis://localhost:6379/0"
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True


CELERY_CONFIG = CeleryConfig

def init_thumbnail_cache(app: Flask) -> S3Cache:
return S3Cache("bucket_name", 'thumbs_cache/')


THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
# Async selenium thumbnail task will use the following user
THUMBNAIL_SELENIUM_USER = "Admin"

Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`

You can override the base URL for selenium using:

.. code-block:: python

WEBDRIVER_BASEURL = "https://superset.company.com"


Additional selenium web drive config can be set using `WEBDRIVER_CONFIGURATION`

You can implement a custom function to authenticate selenium, the default uses flask-login session cookie.
An example of a custom function signature:

.. code-block:: python

def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
pass


Then on config:

.. code-block:: python

WEBDRIVER_AUTH_FUNC = auth_driver

Deeper SQLAlchemy integration
-----------------------------
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ redis==3.2.1
requests==2.22.0
statsd==3.3.0
tox==3.11.1
pillow==7.0.0
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ def get_git_sha():
"hana": ["hdbcli==2.4.162", "sqlalchemy_hana==0.4.0"],
"dremio": ["sqlalchemy_dremio>=0.5.0dev0"],
"cockroachdb": ["cockroachdb==0.3.3"],
"thumbnails": ["Pillow>=7.0.0, <8.0.0"],
},
python_requires="~=3.6",
author="Apache Software Foundation",
Expand Down
1 change: 1 addition & 0 deletions superset/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,4 @@
lambda: results_backend_manager.should_use_msgpack
)
tables_cache = LocalProxy(lambda: cache_manager.tables_cache)
thumbnail_cache = LocalProxy(lambda: cache_manager.thumbnail_cache)
65 changes: 65 additions & 0 deletions superset/charts/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@
from flask_appbuilder.api import expose, protect, rison, safe
from flask_appbuilder.models.sqla.interface import SQLAInterface
from flask_babel import ngettext
from werkzeug.wsgi import FileWrapper

from superset import is_feature_enabled, thumbnail_cache
from superset.charts.commands.bulk_delete import BulkDeleteChartCommand
from superset.charts.commands.create import CreateChartCommand
from superset.charts.commands.delete import DeleteChartCommand
Expand All @@ -39,9 +41,12 @@
ChartPostSchema,
ChartPutSchema,
get_delete_ids_schema,
thumbnail_query_schema,
)
from superset.constants import RouteMethod
from superset.models.slice import Slice
from superset.tasks.thumbnails import cache_chart_thumbnail
from superset.utils.selenium import ChartScreenshot
dpgaspar marked this conversation as resolved.
Show resolved Hide resolved
from superset.views.base_api import BaseSupersetModelRestApi

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -118,6 +123,11 @@ class ChartRestApi(BaseSupersetModelRestApi):
filter_rel_fields_field = {"owners": "first_name"}
allowed_rel_fields = {"owners"}

def __init__(self, *args, **kwargs):
if is_feature_enabled("THUMBNAILS"):
self.include_route_methods = self.include_route_methods | {"thumbnail"}
super().__init__(*args, **kwargs)

@expose("/", methods=["POST"])
@protect()
@safe
Expand Down Expand Up @@ -340,3 +350,58 @@ def bulk_delete(self, **kwargs) -> Response: # pylint: disable=arguments-differ
return self.response_403()
except ChartBulkDeleteFailedError as e:
return self.response_422(message=str(e))

@expose("/<pk>/thumbnail/<digest>/", methods=["GET"])
@protect()
@rison(thumbnail_query_schema)
@safe
def thumbnail(self, pk, digest, **kwargs): # pylint: disable=invalid-name
"""Get Chart thumbnail
---
get:
description: Compute or get already computed chart thumbnail from cache
parameters:
- in: path
schema:
type: integer
name: pk
- in: path
schema:
type: string
name: sha
responses:
200:
dpgaspar marked this conversation as resolved.
Show resolved Hide resolved
description: Chart thumbnail image
content:
image/*:
schema:
type: string
format: binary
401:
$ref: '#/components/responses/401'
404:
$ref: '#/components/responses/404'
422:
$ref: '#/components/responses/422'
500:
$ref: '#/components/responses/500'
"""
chart = self.datamodel.get(pk, self._base_filters)
if not chart:
return self.response_404()
if kwargs["rison"].get("force", False):
cache_chart_thumbnail.delay(chart.id, force=True)
return self.response(202, message="OK Async")
# fetch the chart screenshot using the current user and cache if set
screenshot = ChartScreenshot(pk).get_from_cache(cache=thumbnail_cache)
# If not screenshot then send request to compute thumb to celery
if not screenshot:
cache_chart_thumbnail.delay(chart.id, force=True)
return self.response(202, message="OK Async")
# If digests
if chart.digest != digest:
logger.info("Requested thumbnail digest differs from actual digest")
return self.response(304, message="Digest differs")
dpgaspar marked this conversation as resolved.
Show resolved Hide resolved
return Response(
FileWrapper(screenshot), mimetype="image/png", direct_passthrough=True
)
4 changes: 4 additions & 0 deletions superset/charts/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@
from superset.utils import core as utils

get_delete_ids_schema = {"type": "array", "items": {"type": "integer"}}
thumbnail_query_schema = {
"type": "object",
"properties": {"force": {"type": "boolean"}},
}


def validate_json(value):
Expand Down
73 changes: 73 additions & 0 deletions superset/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from datetime import datetime
from subprocess import Popen
from sys import stdout
from typing import Type, Union

import click
import yaml
Expand Down Expand Up @@ -454,6 +455,78 @@ def flower(port, address):
Popen(cmd, shell=True).wait()


@superset.command()
@with_appcontext
@click.option(
"--asynchronous",
"-a",
is_flag=True,
default=False,
help="Trigger commands to run remotely on a worker",
)
@click.option(
"--dashboards_only",
"-d",
is_flag=True,
default=False,
help="Only process dashboards",
)
@click.option(
"--charts_only", "-c", is_flag=True, default=False, help="Only process charts"
)
@click.option(
"--force",
"-f",
is_flag=True,
default=False,
help="Force refresh, even if previously cached",
)
@click.option("--model_id", "-i", multiple=True)
def compute_thumbnails(
asynchronous: bool,
dashboards_only: bool,
charts_only: bool,
force: bool,
model_id: int,
):
"""Compute thumbnails"""
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.tasks.thumbnails import (
cache_chart_thumbnail,
cache_dashboard_thumbnail,
)

def compute_generic_thumbnail(
friendly_type: str,
model_cls: Union[Type[Dashboard], Type[Slice]],
model_id: int,
compute_func,
):
query = db.session.query(model_cls)
if model_id:
query = query.filter(model_cls.id.in_(model_id))
dashboards = query.all()
count = len(dashboards)
for i, model in enumerate(dashboards):
if asynchronous:
func = compute_func.delay
action = "Triggering"
else:
func = compute_func
action = "Processing"
msg = f'{action} {friendly_type} "{model}" ({i+1}/{count})'
click.secho(msg, fg="green")
func(model.id, force=force)

if not charts_only:
compute_generic_thumbnail(
"dashboard", Dashboard, model_id, cache_dashboard_thumbnail
)
if not dashboards_only:
compute_generic_thumbnail("chart", Slice, model_id, cache_chart_thumbnail)


@superset.command()
@with_appcontext
def load_test_users():
Expand Down
7 changes: 7 additions & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,8 @@ def _try_json_readsha(filepath, length): # pylint: disable=unused-argument
"ENABLE_EXPLORE_JSON_CSRF_PROTECTION": False,
"KV_STORE": False,
"PRESTO_EXPAND_DATA": False,
# Exposes API endpoint to compute thumbnails
"THUMBNAILS": False,
"REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD": False,
"SHARE_QUERIES_VIA_KV_STORE": False,
"TAGGING_SYSTEM": False,
Expand All @@ -307,6 +309,11 @@ def _try_json_readsha(filepath, length): # pylint: disable=unused-argument
# return feature_flags_dict
GET_FEATURE_FLAGS_FUNC: Optional[Callable[[Dict[str, bool]], Dict[str, bool]]] = None

# ---------------------------------------------------
# Thumbnail config (behind feature flag)
# ---------------------------------------------------
THUMBNAIL_SELENIUM_USER = "Admin"
THUMBNAIL_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}

# ---------------------------------------------------
# Image and file configuration
Expand Down
Loading