Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for LIMIT in pagination. #1752

Draft
wants to merge 2 commits into
base: feature/pagination/limit
Choose a base branch
from

Conversation

Yury-Fridlyand
Copy link
Collaborator

@Yury-Fridlyand Yury-Fridlyand commented Jun 17, 2023

Description

This PR is design review for supporting LIMIT in pagination. It includes:

  • doc describing changes need for this feature
  • MVP PoC code

TODOs:

  • Close cursor when limit reached
  • Optimize push down operations - don't push down page size if page_size > limit
  • Test MergeFilterAndFilter and PushFilterUnderSort optimizer rules (probably by combining WHERE and HAVING clauses or with PPL query)

Issues Resolved

N/A

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@codecov
Copy link

codecov bot commented Jun 17, 2023

Codecov Report

Merging #1752 (195de33) into feature/pagination/limit (94d5479) will increase coverage by 0.74%.
The diff coverage is 64.86%.

@@                      Coverage Diff                       @@
##             feature/pagination/limit    #1752      +/-   ##
==============================================================
+ Coverage                       97.33%   98.08%   +0.74%     
+ Complexity                       4408     3436     -972     
==============================================================
  Files                             388      292      -96     
  Lines                           10938     8364    -2574     
  Branches                          773      573     -200     
==============================================================
- Hits                            10647     8204    -2443     
+ Misses                            284      157     -127     
+ Partials                            7        3       -4     
Flag Coverage Δ
sql-engine 98.08% <64.86%> (+0.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...opensearch/sql/planner/physical/LimitOperator.java 55.17% <27.77%> (-44.83%) ⬇️
...ch/sql/executor/pagination/CanPaginateVisitor.java 100.00% <100.00%> (ø)
...ch/sql/planner/optimizer/LogicalPlanOptimizer.java 100.00% <100.00%> (ø)
...planner/optimizer/rule/read/TableScanPushDown.java 100.00% <100.00%> (ø)
...logical/PrometheusLogicalPlanOptimizerFactory.java 100.00% <100.00%> (ø)

... and 96 files with indirect coverage changes

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>

## Solution

Don't do push down for `LIMIT`. `LimitOperator` Physical Plan Tree node will cut off yielding search results with minimal overhead.
Copy link
Collaborator

@forestmvey forestmvey Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we utilize the limit operator how will this work with any other operator that needs to do post-processing. This also asks the question of how we will continue to add operators and how will they work with each other. We may need some way to chain operators if we don't want to have the limitation of one operator for post-processing per query.

Rather than have an operator to limit the output with post processing why not include the functionality as part of the base class. We can have an optional limit if push down isn't available that can be performed in the PhysicalPlan base class prior to the inheriting class's next() call. Another option would be to have the limit post-processing performed in the ResourceMonitorPlan prior to calling the delegate next().

}

/**
* Optimize {@link LogicalPlan}.
*/
public LogicalPlan optimize(LogicalPlan plan) {
LogicalPlan optimized = internalOptimize(plan);
var node = plan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just rename the argument?


## Problem statement

`LIMIT` clause is being converted to `size` by SQL plugin during push down operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to properly define what we want to accomplish, but discuss what is currently the problem with the code. I would change this to talk about LIMIT and size conflict, but also include the use cases:

  1. When LIMIT > size
  2. When size < LIMIT
  3. When size == LIMIT (this is kind of naive case, but maybe just mention it anyways)
    Then discuss exactly how we expect the system to behave (which does it return a cursor, when does it NOT return a cursor, where does it break (max_window_size), and does this override the default fetch size.
    If there is anything to add in terms of how the JDBC driver will behave, then include it here.


## Solution

Don't do push down for `LIMIT`. `LimitOperator` Physical Plan Tree node will cut off yielding search results with minimal overhead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the solution we talked about in the walk-through... I think you just need to reverse the business logic to execute all the rules on nodes before proceeding to the next node...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this section is similar to Fix below

CreateTableScanBuilder
PushDownPageSize
...
PUSH_DOWN_LIMIT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this ordering important?

...
```

This gives us warranty that `pushDownLimit` operation would be rejected if `pushPageSize` called before. Then, not optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This gives us warranty that `pushDownLimit` operation would be rejected if `pushPageSize` called before. Then, not optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node.
This gives us guarantee that `pushDownLimit` operation will be rejected if `pushPageSize` already called. In that case, the not-optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node.

3. Make `LimitOperator` properly serialized and deserialized.
4. Make `OpenSearchIndexScanBuilder::pushDownLimit` return `false` if `pushDownPageSize` was called before.
5. (Optional) Groom `Optimizer` to reduce amount of unchecked casts and uses raw classes.
6. (Optional) Rework `Optimizer` to make it a tree visitor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope work. Please raise a follow-up ticket to do these later. You can label this as maintenance ticket.

new MergeFilterAndRelation(),
new MergeAggAndIndexScan(),
new MergeAggAndRelation()
));
).map(r -> (Rule<LogicalPlan>)r).collect(Collectors.toList()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is messy, and we can update the Rules classes to extend LogicalPlan instead.

@Yury-Fridlyand
Copy link
Collaborator Author

Design review meeting notes:

  1. Approach is correct, we should proceed in that direction
  2. Split implementation into parts:
    1. Optimizer rework
    2. Reorder rules and update push down operations
    3. Add support for limit in paginated queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pagination Pagination feature, ref #656
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants