Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-15] Adding database level python-date-format #8474

Conversation

john-bodley
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

This PR addresses the SIP-15 issue where for string column types the start endpoint may behave like (start rather than [start due to lexicographical ordering if the RHS is not encoded to mimic the formatting of the LHS.

For example at Airbnb we have over 20,000 tables in Superset which have a ds column (datestamp) which adheres to the YYYY-MM-DD ISO 8601 format however no python-date-format is specified for these columns and thus the filter endpoints behave like,

> SELECT
    '2018-01-01' >= '2018-01-01 00:00:00',
    '2018-01-01' <= '2018-01-02 00:00:00'
FALSE TRUE

i.e., (start rather than [start. This problem could be remedied by setting the python-date-format for every table but that seems overkill and doesn't ensure newly added tables will define the formatting. This PR creates a default column/python-date-format mapping on a per engine basis where one can specify a fallback (default) format if not specified.

Note #8464 switched the order of evaluation for obtaining the datetime SQL literal ensuring known per engine data types conversions, i.e., DATE, TIMESTAMP, etc. are applied prior to any python-date-format. This helps to ensure that the mapping isn't heavy handed, i.e., applying the format in an unnecessary or problematic manner.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen Shot 2019-10-29 at 11 45 32 PM

Screen Shot 2019-10-29 at 11 46 04 PM

Before

Screen Shot 2019-10-30 at 12 11 06 AM

After

Screen Shot 2019-10-30 at 12 10 54 AM

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

to: @betodealmeida @etr2460 @michellethomas @mistercrunch @villebro @willbarrett

@john-bodley john-bodley force-pushed the john-bodley--sip-15--python-date-format-fallback branch from 52131dd to 892164f Compare October 30, 2019 07:34
Copy link
Member

@willbarrett willbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of grammar nits and a non-blocking question about non-inclusive start times. Thank you for adding the documentation!

@@ -1306,7 +1306,17 @@ SIP-15

`SIP-15 <https://github.com/apache/incubator-superset/issues/6360>`_ aims to ensure that time intervals are handled in a consistent and transparent manner for both the Druid and SQLAlchemy connectors.

Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive depending on the time column (refer to the SIP for details) and thus the endpoint behavior could be unknown. To aid with transparency the current endpoint behavior is explicitly called out in the chart time range (post SIP-15 this will be [start, end) for all connectors and databases). One can override the defaults on a per database level via the ``extra``
Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive for string columns (due to lexicographical ordering) if no formatting was defined and the column formatting did not conform to an ISO 8601 date-time (refer to the SIP for details).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be a convenience for the reader to add a link to the SIP issue on Github.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett there's a link to the GitHub issue on line #1307.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whup, I missed the context. Thanks!

@@ -112,6 +112,9 @@ class TableColumnInlineView(CompactCRUDMixin, SupersetModelView):
"define an expression and type for transforming the string into a "
"date or timestamp. Note currently time zones are not supported. "
"If time is stored in epoch format, put `epoch_s` or `epoch_ms`."
"If no pattern is specified we fallback to using the optional defaults "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fallback is a noun, fall back is the verb form.

@@ -113,7 +113,10 @@ function ColumnCollectionTable({
you will need to define an expression and type for
transforming the string into a date or timestamp. Note
currently time zones are not supported. If time is stored
in epoch format, put \`epoch_s\` or \`epoch_ms\`.`)}
in epoch format, put \`epoch_s\` or \`epoch_ms\`. If no pattern
is specified we fallback to using the optional defaults on a per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fallback is a noun, fall back is the verb form.

@@ -169,15 +169,19 @@ def get_time_filter(
col = self.get_sqla_col(label="__time")
l = []
if start_dttm:
l.append(col >= text(self.dttm_sql_literal(start_dttm)))
l.append(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is now configuration for inclusive/exclusive start points, should we have another if condition in this block supporting col > <start time literal statement>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett per SIP-15 the end state is [start, end) i.e., this should not be configurable. The reason for the interim configuration is to allow users to preview their charts using the [start, end) time range endpoints before saving and thus persisting this information in the chart form data.

The time_range_endpoints form-data is merely there as a mechanism to provide both the current and future states simultaneously for a period which enabling an institution to manually fix charts as necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying!

@john-bodley john-bodley force-pushed the john-bodley--sip-15--python-date-format-fallback branch 3 times, most recently from 2550f04 to 5fb124e Compare October 30, 2019 17:49
@john-bodley
Copy link
Member Author

@willbarrett thanks for the review. I addressed your fallback vs. fall back comments.

@john-bodley john-bodley force-pushed the john-bodley--sip-15--python-date-format-fallback branch from 5fb124e to 04b55d2 Compare October 30, 2019 17:53
@codecov-io
Copy link

codecov-io commented Oct 30, 2019

Codecov Report

Merging #8474 into master will increase coverage by 4.93%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8474      +/-   ##
==========================================
+ Coverage   66.52%   71.46%   +4.93%     
==========================================
  Files         449      117     -332     
  Lines       22595    12334   -10261     
  Branches     2367        0    -2367     
==========================================
- Hits        15032     8814    -6218     
+ Misses       7425     3520    -3905     
+ Partials      138        0     -138
Impacted Files Coverage Δ
superset/config.py 89.34% <ø> (ø) ⬆️
superset/connectors/sqla/views.py 67.52% <ø> (ø) ⬆️
superset/connectors/sqla/models.py 84.81% <66.66%> (-0.12%) ⬇️
...sets/src/explore/components/ExploreChartHeader.jsx
superset/assets/src/components/Checkbox.jsx
...perset/assets/src/dashboard/util/getEmptyLayout.js
...ts/src/dashboard/util/getComponentWidthFromDrop.js
.../assets/src/dashboard/reducers/dashboardFilters.js
...rc/explore/components/controls/AnnotationLayer.jsx
superset/assets/src/welcome/DashboardTable.jsx
... and 324 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 18c6d17...04b55d2. Read the comment docs.

@john-bodley john-bodley merged commit 554a6d8 into apache:master Oct 31, 2019
john-bodley added a commit to john-bodley/superset that referenced this pull request Oct 31, 2019
@dpgaspar dpgaspar added v0.35 and removed v0.35 labels Dec 20, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants