Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-27] Proposal for Paranoid Deletes #8639

Closed
john-bodley opened this issue Nov 23, 2019 · 8 comments
Closed

[SIP-27] Proposal for Paranoid Deletes #8639

john-bodley opened this issue Nov 23, 2019 · 8 comments
Assignees
Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal

Comments

@john-bodley
Copy link
Member

john-bodley commented Nov 23, 2019

[SIP] Proposal for Paranoid Deletes

Motivation

At Airbnb we have a vast number of entities housed within Superset. Our deployment has tens of thousands of charts (both manually and procedurally generated), thousands of dashboards, and tens of thousands of registered datasources and tables (both physical and virtual).

In a recent analysis of a specific Druid NoSQL (native) cluster, from a sample of ~ 5k charts only 34% of charts rendered, i.e., returned a 200 status code from the /supserset/slice_json route.

The following chart shows the renderability of charts as a function of last saved, which shows that a chart's viability often decays over time due to creep in the datasource metadata and the saved chart parameters.

Screen Shot 2019-11-22 at 5 50 14 PM

Ideally we would like to have a mechanism to clean up obsolete resources (charts, dashboards, or datasources) in a somewhat paranoid manner, i.e., using soft deletes. This should help keep our deployment at a manageable size and improve the perceived reliability and quality (from a usability standpoint) of Superset assets.

Proposed Change

The proposed solution was originally mentioned by @etr2460 but I thought it was worthwhile formalizing this as a SIP. This borrows an idea from Ruby where we first soft delete records my marking them as deleted (with an associated timestamp) before performing a hard delete (deleting the record n-days later). Users could be prompted that their charts were being deleted and they can take corrective action to undelete it if they see fit.

There's actually a Python package sqla-paranoid which brings transparent soft deletes to SQLAlchemy which we could use or replicate. The TL;DR is this would add a deleted_at (or deleted_on for consistency) column which would track soft deleted records. Records which are soft deleted wouldn't show up in the CRUD views by default unless the filter was enabled (not unlike how SQL Lab Views are ignored by default in the tablemodelview).

Records could be marked using a hook, trigger, or cron as deletable based on various criterion using cascading context:

Charts

  • Consistently† returns an error.
  • Consistently† returns no data.
  • Has not been viewed for n-days.

† Note this could leverage a cron (or similar) and be based on customizable rules, i.e., x of the last n-days.

Dashboards

  • Contains no charts.
Tables/Datasources
  • Not referenced by any charts.

New or Changed Public Interfaces

We would need to updated the data model and leverage sql-paranoid (or similar) for enabling the soft-deletes. We would also need to update the FAB views to handle filtering/exclusion of soft deleted records. Finally we would need to implement triggers or similar to i) soft delete records, and ii) hard delete records.

New dependencies

The only new dependency would be sqla-paranoid (no public license) if we decided not to write this ourself. Note the package only contains several hundred lines of codes.

Migration Plan and Compatibility

We would need to update the schema to include the deleted_at (or deleted_on) column for certain tables. Note I think we only need this for charts, dashboards, and datasources (the cascade deletes should handle the cleanup of columns and metrics).

Rejected Alternatives

None.

to: @etr2460 @mistercrunch @villebro @willbarrett
cc: @vylc

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #enhancement to this issue, with a confidence of 0.71. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the enhancement:request Enhancement request submitted by anyone from the community label Nov 23, 2019
@john-bodley john-bodley changed the title [SIP-26] Paranoid Deletes [SIP-25] Paranoid Deletes Nov 23, 2019
@john-bodley john-bodley added the sip Superset Improvement Proposal label Nov 23, 2019
@john-bodley john-bodley changed the title [SIP-25] Paranoid Deletes [SIP-25] Proposal for Paranoid Deletes Nov 23, 2019
@john-bodley john-bodley changed the title [SIP-25] Proposal for Paranoid Deletes [SIP-27] Proposal for Paranoid Deletes Nov 25, 2019
@willbarrett
Copy link
Member

@john-bodley are you thinking that the cascade deletes will also be soft/paranoid? In general I would be very in favor of moving to soft-deletes.

@john-bodley
Copy link
Member Author

@willbarrett I think it depends on the entity type. For charts, dashboards, and datasources I think these should be soft-deletes, however cascade deletes are valid for removing columns/metrics once/if the datasource/table is hard deleted.

I think there's merit in enumerating which tables should support soft and cascading deletes.

@bkyryliuk
Copy link
Member

+1 for this proposal, we are thinking of implementing this @ dropbox as well either through APIs & some heuristics. It would be great to have 1st class support for this in Superset

@junlincc
Copy link
Member

+1 for this proposal
We added this request to roadmap lately, as we consider this feature as must have for an enterprise software. this sip seems to be the perfect solution! Please help me understand a couple of things~

  1. how long can a soft deleted object be stored until it get premaritally deleted?
  2. can user set the time for hard delete?

will either of you be interested in collaborating on this feature? @john-bodley @bkyryliuk?

@etr2460
Copy link
Member

etr2460 commented Mar 16, 2021

Replying here @junlincc (although i'm not John or Bogdan 😛 ):

I think the thought is to have the time between soft delete and hard delete be configured by the Superset admin.

Note that the discussion here is more from the admin perspective than the user perspective. From the user's viewpoint, once something is soft deleted, they won't be able to find it in the Superset application (or maybe will only be able to find it in a trash can page). But if it's right after the deletion occurs, they could talk to the admin or self serve recover it from the trash can. But once the time between soft and hard delete expires, the entity is gone for good

@bkyryliuk
Copy link
Member

+1 to @etr2460 point

time to live should be configurable in the superset config or per database and yeah it should be set by admins

As for implementation, it is unlikely that we would be able to commit to it in the next couple quarters.

@junlincc
Copy link
Member

junlincc commented Mar 17, 2021

thanks both! @etr2460 and @bkyryliuk
To summarize

  • Superset currently only supports hard delete entities, which needs to be changed to soft delete
  • Entity include datasets, charts, dashboards and saved queries
  • Soft deleted records will not appear in search results or in list view, but they are not permanently deleted within a set period of time (time to live)
  • Only system admin can set the time to live ,at database level, for the users
  • Soft deleted records are temporary stored in a trash can
  • Soft deleted records can be restored by both admin and users from the trash can

I'm not very familiar with the role permission in Superset tbh. By reading the documentation, seems like we should change the logic of can_delete, and add something like can_restore?

Model & Action: models are entities like Dashboard, Slice, or User. Each model has a fixed set of permissions, like can_edit, can_show, can_delete, can_list, can_add, and so on. For example, you can allow a user to delete dashboards by adding can_delete on Dashboard entity to a role and granting this user that role.

Please educate me or lmk if I missed anything 😅
@amitmiran137 thoughts?

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal
Projects
Status: Denied / Closed / Discarded
Development

No branches or pull requests

6 participants