Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a FilterBundle.Builder class and use it to construct FilterBundle. #17055

Merged
merged 12 commits into from
Sep 17, 2024

Conversation

cecemei
Copy link
Contributor

@cecemei cecemei commented Sep 12, 2024

Refactors how FilterBundle is constructed in Filter and QueryableIndexCursorHolder.

Description

Before this PR, QueryableIndexCursorHolder calls Filter.makeFilterBundle to build FilterBundle. The BitmapColumnIndex.computeBitmapResult is called at the same time. In the logical AndFilter and OrFilter, the ordering of child filters would affect what indices are chosen. Since filters could have different compute cost (a.k.a union multiple bitmaps is more expensive than single bitmap), we'd like less expensive filters to be computed first.

After this PR, QueryableIndexCursorHolder builds a FilterBundle.Builder instance, which contains useful information such as ColumnIndexSelector and BitmapColumnIndex, then calls builds to build FilterBundle. The actual build logic still exists in Filter.makeFilterBundle, since AndFilter/OrFilter could have separate logic from other Filter. This enables AndFilter/OrFilter to sort its child filters by the computing cost in ascending order, before making the BitmapColumnIndex.computeBitmapResult calls.

For now, all indices would have a cost of Integer.MAX_VALUE, I intend to make a future PR to update cost for all inherited BitmapColumnIndex classes. Note that this PR would not have any real impact on building FilterBundle, since all filters having the same cost so there's no real sorting.

Alternatively we've considered making FilterBundle.IndexBundle to store all indices and delay the index/matcher partitioning. The major con is that this seems like changing existing behavior too much and it could mean a lot of rewrite to existing FilterBundle class and Filter.makeFilterBundle method.

Benchmark comparison

query 1

SELECT string2, SUM(long1) FROM foo WHERE string1 = '1000' AND string5 LIKE '%1%' AND (string3 in ('1', '10', '20', '22', '32') AND long2 IN (1, 19, 21, 23, 25, 26, 46) AND double3 < 1010.0 AND double3 > 1000.0 AND (string4 = '1' OR REGEXP_EXTRACT(string1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || string2, '^Z2') IS NOT NULL)) GROUP BY 1 ORDER BY 2

before

Benchmark                        (deferExpressionDimensions)  (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score   Error  Units
SqlExpressionBenchmark.querySql         fixedWidthNonNumeric       41           5000000  explicit        force  avgt    5  120.019 ± 2.536  ms/op
SqlExpressionBenchmark.querySql         fixedWidthNonNumeric       41           5000000      auto        force  avgt    5  115.933 ± 3.654  ms/op

after

Benchmark                        (deferExpressionDimensions)  (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score   Error  Units
SqlExpressionBenchmark.querySql         fixedWidthNonNumeric       41           5000000  explicit        force  avgt    5  121.131 ± 1.423  ms/op
SqlExpressionBenchmark.querySql         fixedWidthNonNumeric       41           5000000      auto        force  avgt    5  116.353 ± 2.335  ms/op

query 2

SELECT SUM(long1) FROM foo WHERE string1 = '1000' AND string5 LIKE '%1%'

before

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt  Score   Error  Units
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    5  7.518 ± 0.310  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    5  7.461 ± 0.220  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    5  7.475 ± 0.188  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    5  7.380 ± 0.170  ms/op

after

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt  Score   Error  Units
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    5  7.631 ± 0.430  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    5  7.517 ± 0.205  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    5  7.534 ± 0.224  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    5  7.457 ± 0.332  ms/op


Key changed/added classes in this PR
  • FilterBundle.Builder
  • FilterBundle
  • Filter
  • BooleanFilter
  • AndFilter
  • OrFilter
  • BitmapColumnIndex
  • QueryableIndexCursorHolder

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 🤘

Could you please re-run some of the benchmarks of #15838 just to make sure everything is still working as expected?

@@ -732,8 +731,7 @@ private static FilterBundle makeFilterBundle(
return null;
}
final long bitmapConstructionStartNs = System.nanoTime();
final FilterBundle filterBundle = filter.makeFilterBundle(
bitmapIndexSelector,
final FilterBundle filterBundle = new FilterBundle.Builder(filter, bitmapIndexSelector).build(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just in case we should add a new boolean query context flag, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/QueryContexts.java, something like "cursorAutoArrangeFilters" and pass it into the builder so that we can disable the sort per query if we need to for any reason.

You can get access to the QueryContext itself from the CursorBuildSpec passed into the constructor of QueryableIndexCursorHolder, pass it into the CursorResources to feed to this method, and retrieve the value with something like queryContext.getBoolean(QueryContexts.CURSOR_AUTO_ARRANGE_FILTERS) (or whatever you name the static variable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make the change to add the flag - thanks for the suggestion!

@cecemei
Copy link
Contributor Author

cecemei commented Sep 13, 2024

Will do! out of curiosity, are benchmark tests being ran regularly, or integrated with any workflows?

nice 🤘

Could you please re-run some of the benchmarks of #15838 just to make sure everything is still working as expected?

@cecemei cecemei marked this pull request as draft September 13, 2024 22:43
@cecemei cecemei marked this pull request as ready for review September 15, 2024 01:29
@@ -112,6 +113,7 @@ public QueryableIndexCursorHolder(
Cursors.getTimeOrdering(ordering),
interval,
filter,
cursorBuildSpec.getQueryContext().getBoolean(QueryContexts.CURSOR_AUTO_ARRANGE_FILTERS, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably default this to true, or are you planning to wait until we have added the cost computations?

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

…ilter. Also removed the getFilter method and updated AndFilter/OrFilter to use the build method instead.
@clintropolis clintropolis merged commit 88c3c20 into apache:master Sep 17, 2024
90 checks passed
pranavbhole pushed a commit to pranavbhole/druid that referenced this pull request Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants