fix Server.findSeries() limit application #1934

Dieterbe · 2020-10-29T12:35:39Z

the limit was incorrectly applied on the number of api.Series.
these structures represent a full find result (with possibly many
timeseries) for each given combination of pattern and peer.
Instead, we have to look inside of these at the actual amount
of idx.Node structures they contain, and only consider those
that are leaf nodes (as they represent timeseries) and not
branch nodes
make it "streaming" based: as soon as we detect a per-req limit
breach, cancel all pending requests (similar to the mechanism
we already had, where if a peer reports a limit breach on its
proportional request, we also cancel the others)
while we're at it, we may as well filter out unneeded branches
centrally/earlier too

robert-milan

LGTM. I like that the LeavesOnly filtering is now done in findSeries

Do the chaos-stack tests need to be updated or is there an actual problem?

* the limit was incorrectly applied on the number of api.Series. these structures represent a full find result (with possibly many timeseries) for each given combination of pattern and peer. Instead, we have to look inside of these at the actual amount of idx.Node structures they contain, and only consider those that are leaf nodes (as they represent timeseries) and not branch nodes * make it "streaming" based: as soon as we detect a per-req limit breach, cancel all pending requests (similar to the mechanism we already had, where if a peer reports a limit breach on its proportional request, we also cancel the others) * while we're at it, we may as well filter out unneeded branches centrally/earlier too

Dieterbe · 2020-10-30T12:29:29Z

Do the chaos-stack tests need to be updated or is there an actual problem?

fantastic! the test found a legitimate problem! fix is above :)

Dieterbe · 2020-10-30T12:38:14Z

just for fun i also ran the test with this patch

diff --git a/docker/docker-chaos/metrictank.ini b/docker/docker-chaos/metrictank.ini
index 49420377..8ff46178 100644
--- a/docker/docker-chaos/metrictank.ini
+++ b/docker/docker-chaos/metrictank.ini
@@ -241,7 +241,7 @@ max-points-per-req-hard = 20000000
 # limit of number of series a request can operate on. Requests that exceed this limit will be rejected. (0 disables limit)
 # note here we look at all lowlevel series (even if they will be merged or are equivalent), and can't accurately account for
 # requests with duplicate or overlapping targets. See PR #1926 and #1929 for details
-max-series-per-req = 250000
+max-series-per-req = 10
 # require x-org-id authentication to auth as a specific org. otherwise orgId 1 is assumed
 multi-tenant = true
 # in case our /render endpoint does not support the requested processing, proxy the request to this graphite

...to confirm correct behavior when limit is hit. lo and behold:

=== RUN   TestClusterStartup
Matcher matched: zookeeper entered RUNNING state
Matcher matched: kafka entered RUNNING state
Matcher matched: grafana.*HTTP Server Listen.*3000
Matcher matched: metrictank0_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank3_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank5_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank4_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank1_1.*metricIndex initialized.*starting data consumption$
Matcher matched: metrictank2_1.*metricIndex initialized.*starting data consumption$
2020-10-30 14:36:07.276 [INFO] stack now running.
2020-10-30 14:36:07.276 [INFO] Go to http://localhost:3000 (and login as admin:admin) to see what's going on
--- PASS: TestClusterStartup (27.43s)
=== RUN   TestClusterBaseIngestWorkload
    chaos_cluster_test.go:156: could not query correct result set. sum of 12 series, each valued 1, should result in 12.  last response was: (graphite.Response) {
         HTTPErr: (error) <nil>,
         DecodeErr: (*json.SyntaxError)(0xc0003be3e0)(invalid character 'R' looking for beginning of value),
         Code: (int) 413,
         TraceID: (string) (len=16) "37b8a2c546d11964",
         Decoded: (graphite.Data) <nil>,
         BodyErr: (string) (len=116) "Request exceeds max-series-per-req limit (10). Reduce the number of targets or ask your admin to increase the limit."
        }
--- FAIL: TestClusterBaseIngestWorkload (28.58s)
=== RUN   TestQueryWorkload
    chaos_cluster_test.go:174: expected only correct results. (-want +got):
          graphite.CheckResults{
          	Validators: []graphite.Validator{{Name: "ValidateCorrect(12.000000)", Fn: ⟪0x8a5ac0⟫}},
          	Valid: []int{
        - 		600,
        + 		0,
          	},
          	Empty:      0,
          	Timeout:    0,
        - 	Other:      0,
        + 	Other:      600,
        - 	FirstOther: nil,
        + 	FirstOther: &graphite.Response{
        + 		DecodeErr: e"invalid character 'R' looking for beginning of value",
        + 		Code:      413,
        + 		TraceID:   "2753a29beac91ef",
        + 		BodyErr:   "Request exceeds max-series-per-req limit (10). Reduce the number of targets or ask your admin to increase the limit.",
        + 	},
          }
--- FAIL: TestQueryWorkload (60.00s)

and

metrictank0_1  | 2020-10-30 12:37:34.762 [INFO] ts=2020-10-30T12:37:34.762109457Z traceID=69601e29b98bfa7d, sampled=true msg="GET /render?format=json&from=-14s&target=sum%28some.id.of.a.metric.%2A%29 (413) 1.057855ms" orgID=1 sourceIP="192.168.80.1" error="Request%20exceeds%20max-series-per-req%20limit%20%2810%29.%20Reduce%20the%20number%20of%20targets%20or%20ask%20your%20admin%20to%20increase%20the%20limit."
metrictank0_1  | 2020-10-30 12:37:35.362 [INFO] ts=2020-10-30T12:37:35.362137237Z traceID=59ba98b785904cb, sampled=true msg="GET /render?format=json&from=-14s&target=sum%28some.id.of.a.metric.%2A%29 (413) 1.087937ms" orgID=1 sourceIP="192.168.80.1" error="Request%20exceeds%20max-series-per-req%20limit%20%2810%29.%20Reduce%20the%20number%20of%20targets%20or%20ask%20your%20admin%20to%20increase%20the%20limit."

Dieterbe requested a review from robert-milan October 29, 2020 12:35

robert-milan reviewed Oct 29, 2020

View reviewed changes

Dieterbe force-pushed the fix-query-find-limit-counting branch from f909b6b to f1d241f Compare October 30, 2020 07:00

Dieterbe added 2 commits October 30, 2020 09:31

clarification in debug msg

c9d3a75

fix: don't treat errorChan close as an error

73ad650

Dieterbe merged commit cd8e922 into master Oct 30, 2020

Dieterbe deleted the fix-query-find-limit-counting branch October 30, 2020 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Server.findSeries() limit application #1934

fix Server.findSeries() limit application #1934

Dieterbe commented Oct 29, 2020

robert-milan left a comment

Dieterbe commented Oct 30, 2020

Dieterbe commented Oct 30, 2020 •

edited

fix Server.findSeries() limit application #1934

fix Server.findSeries() limit application #1934

Conversation

Dieterbe commented Oct 29, 2020

robert-milan left a comment

Choose a reason for hiding this comment

Dieterbe commented Oct 30, 2020

Dieterbe commented Oct 30, 2020 • edited

Dieterbe commented Oct 30, 2020 •

edited