Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve state key generation for dashboards and charts #18576

Merged

Conversation

michael-s-molina
Copy link
Member

@michael-s-molina michael-s-molina commented Feb 3, 2022

SUMMARY

When opening a dashboard or a chart in Explore, the URL will update with a new state key every time the page is reloaded even when the state isn't updated. This may not be scalable in high-traffic Superset deployments where it can quickly fill up disk space or exhaust the number of cache keys too fast, due to memory restrictions, making it more likely that a link you shared with someone a while ago becomes invalid.

To address this problem we should consider the request context such as user ID, resource ID, etc, to generate new keys only when the context changes. This way we can reuse existing keys but at the same time maintain the uniqueness of storage spaces for different users/dashboards/datasets/charts.

We could use HMAC to generate the same key for the same context but that would require key rotation management.

We can’t base key generation on previous keys because we have POST scenarios where the key is not present. One example would be when the user is inside a dashboard and selects “View chart in Explore”.

We can’t change the structure of the cache key to use the user and resource because this information is not present in the GET/DELETE requests and it would also require keeping backward compatibility with previous versions.

To implement the desired behavior we’ll keep a secondary mapping of user and resources with the previously generated keys. We'll store this mapping together with the entry so they’ll share the same expiration time. We’ll also keep it in sync with DELETE events. That way we can reuse the keys and keep the solution backward compatible.

Hat tip to @zhaoyongjie and @ktmud for bringing this concern during the reviews 👏🏼

Follow-up of #18181

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen.Recording.2022-02-03.at.2.58.42.PM.mov

TESTING INSTRUCTIONS

1 - Check that form_data_key is reused when the same user is accessing the same resource (dashboard, dataset, or chart)
2 - Check that form_data_key changes when the user, dashboard, dataset, or chart changes.
3 - Check that the values associated with a given key are correctly retrieved from the server

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #18576 (62e2df7) into master (801091b) will decrease coverage by 0.21%.
The diff coverage is 66.21%.

❗ Current head 62e2df7 differs from pull request most recent head 47d13f7. Consider uploading reports for the commit 47d13f7 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18576      +/-   ##
==========================================
- Coverage   66.34%   66.13%   -0.22%     
==========================================
  Files        1619     1620       +1     
  Lines       62967    63053      +86     
  Branches     6354     6365      +11     
==========================================
- Hits        41778    41700      -78     
- Misses      19529    19694     +165     
+ Partials     1660     1659       -1     
Flag Coverage Δ
hive 52.22% <35.44%> (-0.03%) ⬇️
javascript 51.25% <27.53%> (-0.09%) ⬇️
mysql ?
postgres ?
presto 52.07% <35.44%> (-0.03%) ⬇️
python 81.51% <100.00%> (-0.35%) ⬇️
sqlite 81.15% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../src/dashboard/components/gridComponents/Chart.jsx 57.60% <0.00%> (-1.29%) ⬇️
superset-frontend/src/hooks/useTabId.ts 0.00% <0.00%> (ø)
...board/components/nativeFilters/FilterBar/index.tsx 69.82% <15.00%> (-11.05%) ⬇️
...rd/components/nativeFilters/FilterBar/keyValue.tsx 18.18% <40.00%> (-21.82%) ⬇️
.../explore/components/ExploreViewContainer/index.jsx 57.14% <83.33%> (+0.15%) ⬆️
...rset-frontend/src/explore/exploreUtils/formData.ts 83.33% <85.71%> (+1.51%) ⬆️
...dashboard/components/menu/ShareMenuItems/index.tsx 67.85% <100.00%> (+1.19%) ⬆️
superset/dashboards/filter_state/api.py 100.00% <100.00%> (ø)
...uperset/dashboards/filter_state/commands/create.py 100.00% <100.00%> (+5.88%) ⬆️
...uperset/dashboards/filter_state/commands/delete.py 95.83% <100.00%> (+0.83%) ⬆️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 801091b...47d13f7. Read the comment docs.

@jinghua-qa
Copy link
Member

/testenv up FEATURE_ENABLE_EXPLORE_DRAG_AND_DROP=true

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2022

@jinghua-qa Ephemeral environment spinning up at http://34.222.115.242:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@jinghua-qa
Copy link
Member

jinghua-qa commented Feb 7, 2022

Question: I tried add a new chart to the dashboard but the form_data_key of the dashboard did not change, is that expected?

Screen.Recording.2022-02-07.at.8.40.31.AM.mov

@michael-s-molina
Copy link
Member Author

michael-s-molina commented Feb 7, 2022

Question: I tried add a new chart to the dashboard but the form_data_key did not change, is that expected?

Yes. It's expected. The native_filter_key shouldn't change for the same dashboard and user.

@geido
Copy link
Member

geido commented Feb 7, 2022

Hey @michael-s-molina it took a few clicks to reproduce the problem sorry about that, but as you will see in the video after going back and forth a few times the Dashboard Video Game Sales went from using 1BoGOSkIZ4VMAxBajAbHIGGewJ9lWQclxJlI1vtXT59ALa2LlvM_qXzvD27xDhDZ to using 75_Q5ExSV_9F_hhG-ytrT65ohOCgIqPNF4M6Tc-XcRQEmBx1_y7n368eHFwQBtXu which is also the same that USA Birth Names is using.

Superset.mp4

@michael-s-molina
Copy link
Member Author

Hey @michael-s-molina it took a few clicks to reproduce the problem sorry about that, but as you will see in the video after going back and forth a few times the Dashboard Video Game Sales went from using 1BoGOSkIZ4VMAxBajAbHIGGewJ9lWQclxJlI1vtXT59ALa2LlvM_qXzvD27xDhDZ to using 75_Q5ExSV_9F_hhG-ytrT65ohOCgIqPNF4M6Tc-XcRQEmBx1_y7n368eHFwQBtXu which is also the same that USA Birth Names is using.

Thanks for spotting that @geido! I submitted a fix.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it may be pragmatic to not initialize a new key-value pair every time the resource is reloaded with the same key. However, removing support for having separate state in two tabs on the same context (user + resource) is a problem. I would like to suggest a compromise: when opening the same resource without a key, a new key-value pair is initialized. However, when the same user reloads the same resource with the same key, the key would not change.

superset/dashboards/filter_state/commands/create.py Outdated Show resolved Hide resolved
superset/dashboards/filter_state/commands/create.py Outdated Show resolved Hide resolved
@zhaoyongjie
Copy link
Member

@michael-s-molina Thanks for working on this! there are 2 concerns in this PR.

  1. How to treat cache temporarily unavailable
    a) make a chart
    b) remove all cache data from Redis or filesystem(depend on how to set up Superset)
    c) refresh explore page
    d) chart will be crash

image

  1. How to treat the same user login in different terminal
    a) open same chart from 2 different browser, like Chrome and Firefox.
    b) user have different session, but they can sync form data because they use same cache key. we can use session id to avoid it.

@michael-s-molina
Copy link
Member Author

@zhaoyongjie Thanks for bringing the multi-session and key expiration problems to my attention. I addressed both problems here and in #18624

@zhaoyongjie
Copy link
Member

@zhaoyongjie Thanks for bringing the multi-session and key expiration problems to my attention. I addressed both problems here and in #18624

Thanks for you tackle this! LGTM

@zhaoyongjie
Copy link
Member

@michael-s-molina In order to avoid code conflicts, we need to merge PR-18624 first.

@michael-s-molina
Copy link
Member Author

@villebro @zhaoyongjie I rebased the code and worked on supporting multi-tab states for the dashboards. The idea is to add an optional parameter called tab_id to the filter_state enpoint. The server will generate different keys for different tabs. On the client-side, I'm using a mix of the Local Storage with the Broadcast Channel API to generate the tab ids. Unfortunately, Safari still does not support the native API (will do in a month), so I'm using broadcast-channel for now. As soon as Safari 15.4 is released we can remove the dependency. I'll work on the Explore part now to complete the PR.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really impressive work - logic looks solid and works just as expected. Bravo! 👏

@michael-s-molina
Copy link
Member Author

@zhaoyongjie @villebro I added the Explore part. Now the PR is complete. In this part, I only replicated the server-side logic in Explore and extracted the tab_id handling to a custom hook so we can use it anywhere.

@zhaoyongjie
Copy link
Member

Hi @michael-s-molina, Thanks for providing different tabs cache mechanisms.

The current solution causes the same session to have different cache keys due to different tab keys. Imagine if the user created 5 tabs, the cache system has to create 5 cache records. For this high-frequency operation, do we consider using some data structures to solve it? for example:

form_data_key = {
  'tab-key1': value,
  'tab-key2: value,
}

Second, could we consider using window.sessionStorage to preserve tab-id? if we use sessionStorage and tab-key in form_data_key, we probably don't need to introduce a new dependence broadcast-channel in frontend.

In the future, we can also improve for_data_key = {tab-key: value} into for_data_key = LRU_ordered_dict to optimize memory space.

@michael-s-molina
Copy link
Member Author

The current solution causes the same session to have different cache keys due to different tab keys. Imagine if the user created 5 tabs, the cache system has to create 5 cache records. For this high-frequency operation, do we consider using some data structures to solve it?
In the future, we can also improve for_data_key = {tab-key: value} into for_data_key = LRU_ordered_dict to optimize memory space.

I think this could be a further optimization that we can implement in the future 👍🏼

Second, could we consider using window.sessionStorage to preserve tab-id? if we use sessionStorage and tab-key in form_data_key, we probably don't need to introduce a new dependence broadcast-channel in frontend.

I considered this option but discarded it because of the following in window.sessionStorage docs:

Duplicating a tab copies the tab's sessionStorage into the new tab.

@michael-s-molina michael-s-molina merged commit 48a8095 into apache:master Feb 14, 2022
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

@rosemarie-chiu
Copy link
Contributor

🏷 2022.7

@rosemarie-chiu
Copy link
Contributor

🏷 preset:2022.7

@jinghua-qa jinghua-qa removed the 2022.7 label Feb 16, 2022
rosemarie-chiu pushed a commit to preset-io/superset that referenced this pull request Feb 16, 2022
…18576)

* feat: Improve state key generation for dashboards and charts

(cherry picked from commit 48a8095)
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2022.7 size/XL 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants