Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

fix Server.findSeries() limit application #1934

Merged
merged 3 commits into from
Oct 30, 2020

Conversation

Dieterbe
Copy link
Contributor

  • the limit was incorrectly applied on the number of api.Series.
    these structures represent a full find result (with possibly many
    timeseries) for each given combination of pattern and peer.
    Instead, we have to look inside of these at the actual amount
    of idx.Node structures they contain, and only consider those
    that are leaf nodes (as they represent timeseries) and not
    branch nodes
  • make it "streaming" based: as soon as we detect a per-req limit
    breach, cancel all pending requests (similar to the mechanism
    we already had, where if a peer reports a limit breach on its
    proportional request, we also cancel the others)
  • while we're at it, we may as well filter out unneeded branches
    centrally/earlier too

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like that the LeavesOnly filtering is now done in findSeries

Do the chaos-stack tests need to be updated or is there an actual problem?

* the limit was incorrectly applied on the number of api.Series.
  these structures represent a full find result (with possibly many
  timeseries) for each given combination of pattern and peer.
  Instead, we have to look inside of these at the actual amount
  of idx.Node structures they contain, and only consider those
  that are leaf nodes (as they represent timeseries) and not
  branch nodes
* make it "streaming" based: as soon as we detect a per-req limit
  breach, cancel all pending requests (similar to the mechanism
  we already had, where if a peer reports a limit breach on its
  proportional request, we also cancel the others)
* while we're at it, we may as well filter out unneeded branches
  centrally/earlier too
@Dieterbe Dieterbe force-pushed the fix-query-find-limit-counting branch from f909b6b to f1d241f Compare October 30, 2020 07:00
@Dieterbe
Copy link
Contributor Author

Do the chaos-stack tests need to be updated or is there an actual problem?

fantastic! the test found a legitimate problem! fix is above :)

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 30, 2020

just for fun i also ran the test with this patch

diff --git a/docker/docker-chaos/metrictank.ini b/docker/docker-chaos/metrictank.ini
index 49420377..8ff46178 100644
--- a/docker/docker-chaos/metrictank.ini
+++ b/docker/docker-chaos/metrictank.ini
@@ -241,7 +241,7 @@ max-points-per-req-hard = 20000000
 # limit of number of series a request can operate on. Requests that exceed this limit will be rejected. (0 disables limit)
 # note here we look at all lowlevel series (even if they will be merged or are equivalent), and can't accurately account for
 # requests with duplicate or overlapping targets. See PR #1926 and #1929 for details
-max-series-per-req = 250000
+max-series-per-req = 10
 # require x-org-id authentication to auth as a specific org. otherwise orgId 1 is assumed
 multi-tenant = true
 # in case our /render endpoint does not support the requested processing, proxy the request to this graphite

...to confirm correct behavior when limit is hit. lo and behold:

=== RUN   TestClusterStartup
Matcher matched: zookeeper entered RUNNING state
Matcher matched: kafka entered RUNNING state
Matcher matched: grafana.*HTTP Server Listen.*3000
Matcher matched: metrictank0_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank3_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank5_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank4_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank1_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank2_1.*metricIndex initialized.*starting data consumption$
2020-10-30 14:36:07.276 [INFO] stack now running.
2020-10-30 14:36:07.276 [INFO] Go to http://localhost:3000 (and login as admin:admin) to see what's going on
--- PASS: TestClusterStartup (27.43s)
=== RUN   TestClusterBaseIngestWorkload
    chaos_cluster_test.go:156: could not query correct result set. sum of 12 series, each valued 1, should result in 12.  last response was: (graphite.Response) {
         HTTPErr: (error) <nil>,
         DecodeErr: (*json.SyntaxError)(0xc0003be3e0)(invalid character 'R' looking for beginning of value),
         Code: (int) 413,
         TraceID: (string) (len=16) "37b8a2c546d11964",
         Decoded: (graphite.Data) <nil>,
         BodyErr: (string) (len=116) "Request exceeds max-series-per-req limit (10). Reduce the number of targets or ask your admin to increase the limit."
        }
--- FAIL: TestClusterBaseIngestWorkload (28.58s)
=== RUN   TestQueryWorkload
    chaos_cluster_test.go:174: expected only correct results. (-want +got):
          graphite.CheckResults{
          	Validators: []graphite.Validator{{Name: "ValidateCorrect(12.000000)", Fn: ⟪0x8a5ac0⟫}},
          	Valid: []int{
        - 		600,
        + 		0,
          	},
          	Empty:      0,
          	Timeout:    0,
        - 	Other:      0,
        + 	Other:      600,
        - 	FirstOther: nil,
        + 	FirstOther: &graphite.Response{
        + 		DecodeErr: e"invalid character 'R' looking for beginning of value",
        + 		Code:      413,
        + 		TraceID:   "2753a29beac91ef",
        + 		BodyErr:   "Request exceeds max-series-per-req limit (10). Reduce the number of targets or ask your admin to increase the limit.",
        + 	},
          }
--- FAIL: TestQueryWorkload (60.00s)

and

metrictank0_1  | 2020-10-30 12:37:34.762 [INFO] ts=2020-10-30T12:37:34.762109457Z traceID=69601e29b98bfa7d, sampled=true msg="GET /render?format=json&from=-14s&target=sum%28some.id.of.a.metric.%2A%29 (413) 1.057855ms" orgID=1 sourceIP="192.168.80.1" error="Request%20exceeds%20max-series-per-req%20limit%20%2810%29.%20Reduce%20the%20number%20of%20targets%20or%20ask%20your%20admin%20to%20increase%20the%20limit."
metrictank0_1  | 2020-10-30 12:37:35.362 [INFO] ts=2020-10-30T12:37:35.362137237Z traceID=59ba98b785904cb, sampled=true msg="GET /render?format=json&from=-14s&target=sum%28some.id.of.a.metric.%2A%29 (413) 1.087937ms" orgID=1 sourceIP="192.168.80.1" error="Request%20exceeds%20max-series-per-req%20limit%20%2810%29.%20Reduce%20the%20number%20of%20targets%20or%20ask%20your%20admin%20to%20increase%20the%20limit."

@Dieterbe Dieterbe merged commit cd8e922 into master Oct 30, 2020
@Dieterbe Dieterbe deleted the fix-query-find-limit-counting branch October 30, 2020 12:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants