Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tumble windowing function #37

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Sep 19, 2023

Description

Add TUMBLE windowing function which is required by materialized view support.

The function generates a new column called window. It's a struct field consist of start and end field inside. The implementation is actually delegated to Spark existing window() function. Here is an example:

spark-sql>
... SELECT window, COUNT(*)
... FROM stream.lineitem_tiny
... GROUP BY TUMBLE(l_shipdate, '1 week')
... ORDER BY window.start
... LIMIT 10;

window	count(1)
{"start":1992-01-02 00:00:00,"end":1992-01-09 00:00:00}	1390
{"start":1992-06-04 00:00:00,"end":1992-06-11 00:00:00}	24980
{"start":1992-07-09 00:00:00,"end":1992-07-16 00:00:00}	24964
{"start":1992-07-16 00:00:00,"end":1992-07-23 00:00:00}	24957
{"start":1992-07-30 00:00:00,"end":1992-08-06 00:00:00}	25012
{"start":1992-08-06 00:00:00,"end":1992-08-13 00:00:00}	24929
{"start":1992-08-20 00:00:00,"end":1992-08-27 00:00:00}	24968
{"start":1992-09-03 00:00:00,"end":1992-09-10 00:00:00}	24972
{"start":1992-11-12 00:00:00,"end":1992-11-19 00:00:00}	24984
{"start":1993-01-07 00:00:00,"end":1993-01-14 00:00:00}	24753
Time taken: 5.095 seconds, Fetched 10 row(s)

Issues Resolved

#25

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added the enhancement New feature or request label Sep 19, 2023
@dai-chen dai-chen self-assigned this Sep 19, 2023
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review September 20, 2023 16:17
Signed-off-by: Chen Dai <daichen@amazon.com>
// Delegate actual implementation to Spark existing window() function
val timeColumn = children.head
val windowDuration = children(1)
window(new Column(timeColumn), windowDuration.toString()).expr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using window function directly? tumble function is more align with streaming sql grammer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently tumble is just an alias of window. The reason for doing this is Streaming SQL has a family of windowing function, ex. tumbling, sliding, session etc. Tumbling window is one of them that happen to be supported by Spark window() function directly.

@dai-chen dai-chen merged commit e72f054 into opensearch-project:main Sep 20, 2023
4 checks passed
@dai-chen dai-chen deleted the add-tumble-windowing-function branch September 20, 2023 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants