Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-98] Proposal for Adding Playwright as an Alternative for Selenium to Enable deck.gl Charts in Reports #24948

Closed
kgabryje opened this issue Aug 10, 2023 · 20 comments
Labels
sip Superset Improvement Proposal

Comments

@kgabryje
Copy link
Member

Motivation

Currently, Superset uses Selenium for rendering charts in reports and in thumbnails. However, this approach has proven to be problematic when dealing with deck.gl charts. The current Selenium-based approach fails to handle these charts properly, resulting in blank visualizations in the reports. To address this issue and provide a more reliable solution for rendering deck.gl charts, we propose integrating Playwright as an alternative to Selenium.

Proposed Change

The proposed change involves introducing Playwright for rendering charts in Superset reports. Playwright is an open-source library for automating web browsers, similar to Selenium but with better support for modern browser features and improved performance. By using Playwright, we aim to provide a more stable and accurate chart rendering experience in Superset reports, especially for deck.gl charts.

We wrote a POC that proved that the underlying problem, which is inability to render WebGL images (which deck.gl charts are based on), does not occur when running reports with Playwright. The screenshots below present the contents of email reports of dashboard containing deck.gl charts when using Selenium (before) and Playwright (after).

Before (Selenium):

image

After (Playwright):

image

Since configuring Playwright requires installing additional dependencies, in order to prevent breaking changes in existing deployments we propose to put the new flow behind a feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS while supporting Selenium at the same time. Users who do not enable the new feature flag will be unaffected by the proposed changes.

New or Changed Public Interfaces

There will be no changes to existing public interfaces in Superset as part of this feature addition.

New dependencies

The proposed feature will require the addition of the Playwright library as a new Python dependency. Playwright is actively maintained, with Apache 2.0 license.

Migration Plan and Compatibility

The migration to Playwright as the alternative to Selenium will not require any database migrations or updates to stored URLs. Existing reports using deck.gl charts should continue to work as expected.

In order to avoid introducing a potential breaking change in existing deployments, Playwright will only be used if PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled. Otherwise, the reports will be run by Selenium.

Rejected Alternatives

  1. Continue using Selenium with geckodriver (Firefox): This approach was rejected due to the limitations of geckodriver, which can't render WebGL visualizations in headless mode.
  2. Continue using Selenium, change the driver to Chrome: This approach was rejected due to problems with taking fullpage screenshots.
@kgabryje kgabryje added the sip Superset Improvement Proposal label Aug 10, 2023
@kgabryje
Copy link
Member Author

@john-bodley
Copy link
Member

john-bodley commented Aug 10, 2023

Thanks @kgabryje for putting this SIP together. Would it be possible to expand further on the rejected alternatives? Specifically are these mentioned issues tracked in the Selenium project as known issues, and if so is their an ETA in terms of when they might be resolved?

Granted it seems like Playwright circumvents the issue currently experienced with Selenium (using either the Firefox or Chrome driver), but it does come at a cost in terms of introducing (and thus supporting) another framework. Additionally are there any known downsides with Playwright, are there circumstances where it fails to render when Selenium succeeds?

The TL;DR is there’s always pros and cons with any approach and thus having a solid understanding of the landscape is highly beneficial before voting on this SIP.

@michael-s-molina
Copy link
Member

Thank you @kgabryje for writing this SIP. I think it's great when we take the time to write SIPs and have the opportunity to discuss big changes before they are merged to master.

Given that deck.gl charts are considered legacy charts that will be removed in the future, did we consider the effort/benefit of migrating them vs introducing another framework that we'll need to support? I don't know if we already have replacements for them, but if we do and the cost is not significantly higher, we could end up in a better state with just one framework and less legacy charts.

@kgabryje
Copy link
Member Author

kgabryje commented Aug 10, 2023

Thanks for comments, I appreciate them!

@john-bodley Historically we switched the driver from Chrome to Firefox as a workaround for screenshots in the reports being cut off. I couldn't find a relevant issue in Selenium's repo, neither recent nor from the time when that change was made in Superset. Firefox on the other hand does not support rendering WebGL in headless mode (https://bugzilla.mozilla.org/show_bug.cgi?id=1375585)
As for Playwright running Chromium, it does seem to cover all bases and I didn't get it to fail in any common scenario - screenshot of a whole dashboard (regardless of the layout) or of a specific element, and most importantly, it does render WebGL.
I agree that having 2 libraries that essentially do the same thing is a big drawback. However, if Playwright turns out to be a success in a long run, I think we might consider fully replacing Selenium and doing a switch in a major release.

@michael-s-molina Most (all?) geospacial visualization libraries are based on WebGL, so even if we did migrate from deck.gl to another lib, we'd face the same issue.
As for replacing deck.gl with another framework - I don't think it's a good way forward. Deck.gl is a well-maintained library with a big community. Our chart plugins are buggy, but I wouldn't put the blame (or at least not all of it) on the library itself - the plugins haven't been properly maintained for years and with some effort we can bring them to the same level of quality as the newer plugins.
I started that effort by refactoring all files to Typescript in order to make them easier to maintain. More of those are coming - refactoring from class to functional components, implementing best practices as described in deck.gl docs, etc.

@john-bodley
Copy link
Member

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

@michael-s-molina
Copy link
Member

Most (all?) geospacial visualization libraries are based on WebGL, so even if we did migrate from deck.gl to another lib, we'd face the same issue.

Fair point.

As for replacing deck.gl with another framework - I don't think it's a good way forward. Deck.gl is a well-maintained library with a big community. Our chart plugins are buggy, but I wouldn't put the blame (or at least not all of it) on the library itself - the plugins haven't been properly maintained for years and with some effort we can bring them to the same level of quality as the newer plugins.
I started that effort by #24933 in order to make them easier to maintain. More of those are coming - refactoring from class to functional components, implementing best practices as described in deck.gl docs, etc.

Is the intention here to rename the plugins and remove legacy from their names or create new ones? What will be the relationship with ECharts GL charts? Do you think we should write a specific SIP about this topic?

@kgabryje
Copy link
Member Author

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

Our version of Selenium (3.141.0) is from 2018 (the Chromium driver was equally ancient) and probably the first thing the community would say is to upgrade to the latest version and see if it solves the problem. I do not know how much effort this upgrade would require, but I think it might be a good time to introduce a newer tool that covers our use cases, while offering a better developer experience (easier to write and read) than Selenium.

I see the drawback of maintaining two libraries/flows, but this would be only temporary until we're ready to make a breaking change and stick to 1 library only.

@kgabryje
Copy link
Member Author

Is the intention here to rename the plugins and remove legacy from their names or create new ones? What will be the relationship with ECharts GL charts? Do you think we should write a specific SIP about this topic?

I don't think the plugins are fundamentally bad - it's a number of smaller issues that make the experience worse than newer plugins. So in my opinion, we should rather focus on fixing, refactoring and improving their usability rather then starting from scratch. If we do that + eventually migrate to API V1, we could drop the "legacy" prefix.
As for the SIP - I think migrating to another library or completely changing the architecture (which I think works well in our current plugins) would warrant a SIP, which is not the case here.

@michael-s-molina
Copy link
Member

michael-s-molina commented Aug 14, 2023

Thanks for the responses @kgabryje. Regarding:

What will be the relationship with ECharts GL charts?

Will we ignore ECharts GL? Use it to replace part of the legacy plugins? Use it to enhance our legacy plugin?

@kgabryje
Copy link
Member Author

Will we ignore ECharts GL? Use it to replace part of the legacy plugins? Use it to enhance our legacy plugin?

Sorry, missed that question. I'm not very familiar with EchartsGL but to my understanding it's an extension of Echarts that provides geospatial visualisations, so it has the same use case as deck.gl. What do you have in mind about "replacing" or "enhancing" our plugins? Are there visualisations that are available in EchartsGL but not in deck.gl?
@villebro I recall we were chatting about EchartsGL vs deck.gl some time ago if you want to chime in 😄

@michael-s-molina
Copy link
Member

What do you have in mind about "replacing" or "enhancing" our plugins?

I'm thinking about the effort to adopt ECharts as our official visualization library. If ECharts GL provide the same features as DeckGL, then maybe we should invest in migrating to ECharts GL instead of improving the DeckGL plugin. If that's not the case, maybe we can still enhance our chart library by enabling some of the ECharts GL charts which would live alongside DeckGL.

Are there visualisations that are available in EchartsGL but not in deck.gl?

I don't know but I think this investigation is kind of a prerequisite before we decide to revamp the legacy plugin. The question we should ask is: Does ECharts supports our GL requirements with their charts? If yes, let's migrate. If no, let's improve the DeckGL plugin and remove the legacy prefix.

@JohnDietrich-Pepper
Copy link

ECharts GL does not support the same functionality as Deck.gl charts at this point in time particularly the flexibility to render things like contour maps that has a pending PR on Superset. Compare
https://echarts.apache.org/examples/en/index.html#chart-type-map
https://deck.gl/examples/arc-layer

Also compared to Deck.gl the ECharts GL seems significantly slower on render on a pretty fast machine (6800HS / 4070) vs. Deck.gl in the examples (hard to compare as examples are a bit different).

Finally, there seems to be interest in actually developing deck.gl charts (heatmap and contour being recent examples) whereas there is no pending PR to add ECharts GL.

@betodealmeida
Copy link
Member

I think that regardless of the state of the legacy deck.gl visualizations or the path we take forward (improve them, replace them with ECharts GL, etc.), we do need to support WebGL in Superset, since not only it's used everywhere in geoviz but also in custom visualizations that handle huge quantities of data. Personally I would love to see the deck.gl visualizations being improved and brought back to tier 1, but I suggest we have that discussion in a separate channel (or SIP), and focus on what solution we're going to use for WebGL.

While I think it might be worth trying the newer version of Selenium (4.11.2 vs the current 3.141.0), I think Playwright has other advantages, since it's much easier to install and run and has a more modern API (with optional support for async). Taking a screenshot with Playwright is incredible simple, since you don't need to worry about all the webdriver options:

pip install playwright
playwright install
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://superset.apache.org")
        print(await page.title())
        await page.screenshot(path="screenshot.png", full_page=True)
        await browser.close()

asyncio.run(main())

@eschutho eschutho changed the title [SIP] Proposal for Adding Playwright as an Alternative for Selenium to Enable deck.gl Charts in Reports [SIP-98] Proposal for Adding Playwright as an Alternative for Selenium to Enable deck.gl Charts in Reports Aug 23, 2023
@eschutho
Copy link
Member

eschutho commented Aug 23, 2023

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

@john-bodley I also spent a little time looking into the selenium github issues for previous questions on the topic. When most people were looking for full-page screenshots, they would direct them to use the Chrome driver and when they needed webgl support, they would direct them to use the Firefox driver. I couldn't find any use cases for either issue where it could be solved by the same driver.

@villebro
Copy link
Member

villebro commented Sep 4, 2023

My quick thoughts on the discussed topics:

  • While ECharts does have similar geospatial functionality as DeckGL, I don't think it's is designed to be a replacement for DeckGL. As we're already doing a fair bit of advanced layering in our DeckGL plugins, I doubt we'd be able to achieve feature parity on our geospatial charts with ECharts (or any other library for that matter).
  • For solving this particular issue, Playwright does seem to be a great alternative that solves our set of minimum requirements. In the interim supporting both is probably the best alternative, but in the long run we should remove support for Selenium in a future major version to reduce maintenance burden if no new feature parity issues arise.

@kgabryje
Copy link
Member Author

SIP implemented by #25247. Thanks to everyone who participated in the discussion!

@michael-s-molina
Copy link
Member

michael-s-molina commented Sep 11, 2023

Closing the SIP as it was approved.

@iamsaso
Copy link

iamsaso commented Jan 11, 2024

@kgabryje is there a reason you didn't include full_page=True parameter for playwright in your PR?

@kgabryje
Copy link
Member Author

@kgabryje is there a reason you didn't include full_page=True parameter for playwright in your PR?

@iamsaso full_page=True is an argument of .screenshot() method when it's called on page object. Here we call it on element object, which screenshots the whole element regardless of viewport size.

@SeanStock-THG
Copy link

SeanStock-THG commented Feb 21, 2024

Hi @kgabryje, I seem to have noticed a massive increase in CPU and RAM resources on taking the screenshots for thumbnails and reports vs Selenium. Is this expected? not sure if can be set to use a specified user like we did with Selenium to avoid it having to load 1000+ charts for each user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

9 participants