Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Performance optimizations for meta tag queries #1517

Merged
merged 4 commits into from
Nov 6, 2019

Conversation

replay
Copy link
Contributor

@replay replay commented Nov 1, 2019

This PR does 4 things, in 4 different commits, so I recommend reviewing them separately:

  1. It adds benchmarks for executing meta tag queries in a relatively realistic scenario
  2. It limits the number of sub queries which can be launched concurrently to the value of TagQueryWorkers. When building the initial result set, if the query expression includes a meta tag, then we instantiate a sub-query in a go routine for each of the involved meta records. If a meta tag has a really large number of meta records associated with it (f.e. dc=dc1 resulting in host=host1,host=host2,host=host3 etc) then we don't want to start all those go routines for the sub queries at once. So this introduces a gate to limit their concurrency.
  3. After we have built the initial result set based on one query expression, we filter it down based on the other given query expressions. For this we're starting separate go routines which are doing the filtering. But if a query only has one expression then we don't need to filter anything, because we can return the initial result set directly. This prevents the creation of the filter workers if there are no expressions to filter by and it also saves us copying the values from the filter worker input channel into the output channel. This required changing the interface of the tag selector a bit, we're now passing the result channel into .getIds() when calling it and .getIds() is now a blocking method.
  4. Remove a bunch of old code that already isn't used anymore in the current master.

also changes the default value for tag query workers from 50 to 5.
actually 50 has always been relatively high, but since we now also
create sub-queries from the expressions associated with meta tags it is
way too high.
also changes the id selector so that it doesn't unnecessarily
deduplicate results when it is called by a sub-query, because
sub-queries don't evaluate meta tags and only if meta tags get evaluated
duplicates are possible.
since we still always need to filter by the from timestamp, we are now
doing this in the id selector by calling a new method on the tag query
context called newerThanFrom.
to make this change work it was necessary to pass the result channel
into getIds(), instead of returning it from it. this resulted in a bit
of rewiring of the channels and especially where channels get closed.
to make it possible to close the result channel when the id selector is
finished, getIds() is now a blocking method.
@replay replay force-pushed the performance_optimizations_for_meta_tag_queries branch from 61eeb13 to 2ec3d6e Compare November 5, 2019 12:40
@replay
Copy link
Contributor Author

replay commented Nov 5, 2019

Rebased onto the latest master

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@replay replay merged commit 16c7bf7 into master Nov 6, 2019
@replay replay deleted the performance_optimizations_for_meta_tag_queries branch November 6, 2019 22:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants