Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize count(*) #3845

Merged
merged 6 commits into from
Apr 30, 2024
Merged

perf: optimize count(*) #3845

merged 6 commits into from
Apr 30, 2024

Conversation

waynexia
Copy link
Member

@waynexia waynexia commented Apr 30, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

TL;DR: 5x

Before

MySQL [(none)]> select count(*) from phy;
+------------+
| COUNT(*)   |
+------------+
| 2582870657 |
+------------+
1 row in set (4 min 38.347 sec)

After

MySQL [(none)]> select count(*) from phy;
+------------+
| COUNT(*)   |
+------------+
| 2582870657 |
+------------+
1 row in set (50.387 sec)

This patch adds a new optimizer rule that converts count(*) to count(<TIME INDEX>). This optimization is based on the fact that our underlying storage engine scans faster on time index column than primary key column. Reading time index column does not need decoding phase like in primary key column. This rule is extended from the one from datafusion and overrides it.

This patch also changes some logic in range plan, making it able to handle aggr expr with alias.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia waynexia added the C-performance Category Performance label Apr 30, 2024
@waynexia waynexia requested review from evenyag and a team as code owners April 30, 2024 09:51
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 30, 2024
@waynexia waynexia enabled auto-merge April 30, 2024 10:03
Copy link

codecov bot commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 73.52941% with 27 lines in your changes are missing coverage. Please review.

Project coverage is 85.27%. Comparing base (81f3007) to head (db5b55e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3845      +/-   ##
==========================================
- Coverage   85.60%   85.27%   -0.33%     
==========================================
  Files         954      955       +1     
  Lines      163325   163426     +101     
==========================================
- Hits       139808   139369     -439     
- Misses      23517    24057     +540     

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@waynexia waynexia added this pull request to the merge queue Apr 30, 2024
Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merged via the queue into GreptimeTeam:main with commit e84b1ee Apr 30, 2024
30 checks passed
@waynexia waynexia deleted the opt-count-star branch April 30, 2024 10:23
@killme2008
Copy link
Contributor

killme2008 commented Apr 30, 2024

This patch has some corner cases that don't handle properly, for example:

Create two tables:

create table "HelloWorld" (a string, b timestamp time index);

create table test (a string, b timestamp time index);

Insert some rows:

insert into "HelloWorld" values ("a", 1) ,("b", 2);

insert into test values ("c", 1) ;
  1. First case: doesn't handle table names in case-sensitive:
mysql> select count(*) from “HelloWorld”;
ERROR 1815 (HY000): DataFusion error: No field named helloworld.b. Valid fields are “HelloWorld”.a, “HelloWorld”.b.
  1. Second case: doesn't handle sub query with count(*):
mysql> select count(*) from (select count(*) from test where a = 'a');
ERROR 1815 (HY000): DataFusion error: No field named test.b. Valid fields are "COUNT(*)".

The above examples work before this patch. @waynexia @fengjiachun @evenyag

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Category Performance docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants