atomicInt64Limiter WithoutSlack doesn't block #90

twelsh-aw · 2022-06-09T16:50:04Z

First off thanks for the library :)

Saw that a new rate limiter was introduced that benchmarked a lot better and pulled it down to try it out.

Noticed that when running WithoutSlack, it just allows everything through instead of waiting because all subsequent Take() calls fall into the case now-timeOfNextPermissionIssue > int64(t.maxSlack)

Easiest way to repro is using your example_test.go:

Using rl := ratelimit.New(100) (slack=10):

go test -run Example -count=1
=== RUN   Example
--- PASS: Example (0.09s)
PASS
ok      command-line-arguments  0.207s

Using rl := ratelimit.New(100, ratelimit.WithoutSlack)

go test -run Example -count=1   
--- FAIL: Example (0.01s)
got:
1 10ms
2 775µs
3 3µs
4 2µs
5 10µs
6 2µs
7 2µs
8 2µs
9 2µs
want:
1 10ms
2 10ms
3 10ms
4 10ms
5 10ms
6 10ms
7 10ms
8 10ms
9 10ms
FAIL
FAIL    command-line-arguments  0.126s
FAIL

Using 0.2.0 rl :=newAtomicBased(100, WithoutSlack)

go test -run Example -count=1
PASS
ok      go.uber.org/ratelimit   0.323s

I am not 100% sure why the other units with mocked clocks are passing, but your example test and my application tests fail consistently with this new limiter. On darwin if that helps.

The text was updated successfully, but these errors were encountered:

storozhukBM · 2022-06-09T17:26:10Z

@twelsh-aw
Yes, the issue is definitely there, I have a quick fix for it, but I also want to spend a bit more time to figure out why our tests don't catch that issue.

rabbbit · 2022-06-09T18:24:08Z

Reverting - @storozhukBM we can re-do the whole PR again (possibly add some tests earlier?)

storozhukBM · 2022-06-09T18:30:04Z

@rabbbit
I figured out that our tests continue to pass even if I break currently existing implementation.
For example if you change ratelimit_test.go:108 to t.clock.Sleep(interval-interval) tests will still pass.

storozhukBM · 2022-06-09T19:10:04Z

@rabbbit
Further findings. Tests do not able to detect issues because with mock clock, literally time, freezes between Take() calls, to when we check now.Sub(oldState.last) in atomicLimiter or when we check now-timeOfNextPermissionIssue in atomicInt64Limiter it always returns 0

storozhukBM · 2022-06-09T19:14:05Z

@rabbbit
If we add r.clock.Add(time.Microsecond) between takes in (r *runnerImpl) startTaking tests start to detect issues.
The problem with that is that we call startTaking in different goroutines and "github.com/benbjohnson/clock" is not concurrently safe, so race detector fails when it finds some races inside mockClock state.

So we either should find concurrent mock clock implementation, or fix "github.com/benbjohnson/clock" or write our own.

rabbbit · 2022-06-09T19:22:01Z

I'm sorry, I won't be able to look at this in detail, for probably next ~10 days (limited laptop access).

For now, I can revert the clock implementation change in the meantime if you think it's better/stable - the tests were (mostly?) stable there. I'm opposed to implementing a clock as part of this package :)

Looks like we'll be stuck with the old implementation for a while - "code freeze" until we fix the tests.

storozhukBM · 2022-06-09T19:26:53Z

@rabbbit agree, I'll look at possible options with time mocking and will fix our tests in the following PR

storozhukBM · 2022-06-27T20:40:20Z

@rabbbit I created a new PR as discussed #93

@twelsh-aw

This limiter was introduced and merged in the following PR #85 Later @twelsh-aw found an issue with this implementation #90 So @rabbbit reverted this change in #91 Our tests did not detect this issue, so we have a separate PR #93 that enhances our tests approach to detect potential errors better. With this PR, we want to restore the int64-based atomic rate limiter implementation as a non-default rate limiter and then check that #93 will detect the bug. Right after it, we'll open a subsequent PR to fix this bug.

@twelsh-aw

This limiter was introduced and merged in the following PR uber-go#85 Later @twelsh-aw found an issue with this implementation uber-go#90 So @rabbbit reverted this change in uber-go#91 Our tests did not detect this issue, so we have a separate PR uber-go#93 that enhances our tests approach to detect potential errors better. With this PR, we want to restore the int64-based atomic rate limiter implementation as a non-default rate limiter and then check that uber-go#93 will detect the bug. Right after it, we'll open a subsequent PR to fix this bug.

storozhukBM · 2022-07-02T21:02:50Z

@rabbbit

PR with fix for this issue #95

Also you can see on tests approach PR that it is able to detect this bug

rabbbit · 2022-07-05T06:33:53Z

Ack. This is pretty high on my priority list, will try to get to this soon. Again, sorry for the delay.

@twelsh-aw

In #90 @twelsh-aw found a bug in a new implementation. This turned out to be caused by us mocking time. Since time mocking will always have some risks this diff proposes to expand the `examples` we're using - since they use "real" time they should be good enough to cover most basic cases like #90. In particular: - we add a "withoutSlack" test so that the exact case reported in #90 doesn't happen. No slack option seems common enough to add it. - updates "withSlack" example to actually show how slack operates. Due to non-even execution times I'm forced to round the time a bit. Possible issues: - test stability: I re-run the test a 1000 times without issues - the timing seems to be stable. - test duration: in total we're extending the examples by 5ms, which shouldn't be human noticeable.

@twelsh-aw

In #90 @twelsh-aw found a bug in a new implementation. This turned out to be caused by us mocking time. Since time mocking will always have some risks this diff proposes to expand the `examples` we're using - since they use "real" time they should be good enough to cover most basic cases like #90. In particular: - we add a "withoutSlack" test so that the exact case reported in #90 doesn't happen. No slack option seems common enough to add it. - updates "withSlack" example to actually show how slack operates. Due to non-even execution times I'm forced to round the time a bit. Possible issues: - test stability: I re-run the test a 1000 times without issues - the timing seems to be stable. - test duration: in total we're extending the examples by 5ms, which shouldn't be human noticeable.

@twelsh-aw

This PR fixes the issue found by @twelsh-aw with int64 based implementation #90 Our tests did not detect this issue, so we have a separate PR #93 that enhances our tests approach to detect potential errors better.

storozhukBM · 2022-10-30T16:16:36Z

@rabbbit We already have a fix in place. What do you think about making atomicInt64Limiter a default implementation again and allowing people to have a try before cutting a new release tag? Or we can make this implementation public so that users can instantiate it. I'd like the new testing approach to be merged first, but if you're unsure about it, we can skip it for now.

rabbbit · 2022-10-30T18:34:17Z

The new atomic version seems to perform so much better than it feels we should enable it.

However, after that, we might just have to declare a code freeze until someone else has more cycles for reviews. Sadly this includes the new time-testing, would need to look at this in much more detail before merging in.

storozhukBM · 2022-10-30T18:37:47Z

@rabbbit OK, so should I make a PR now?

rabbbit · 2022-10-31T00:05:46Z

SGTM; also, thanks for pushing this through :)

storozhukBM · 2022-10-31T00:55:52Z

@rabbbit
Please take a look
#101

* Fix return timestamp discrepancy between regular atomic limiter and int64 based one * Make int64 based atomic limiter default Long story: this was added in #85, but reverted in #91 due to #90. #95 fixed the issue, so we're moving forward with the new implementation.

rabbbit · 2022-10-31T03:14:11Z

Closing as fixed - we reverted the bad change in #91, @storozhukBM fixed it in #95, but we were defaulting to the old implementation. #101 makes the new (faster) implementation the default.

storozhukBM · 2022-10-31T03:14:50Z

@twelsh-aw this change is now merged, can you please try again and give us your feedback

storozhukBM · 2022-10-31T03:15:48Z

thank you @rabbbit

There were quite a few fixes in that repo, let's pull them in. We've had a few mock-clock bugs before (#90), I'm hoping that with the new versions #93 won't be necessary. I'll try reverting #95 on a branch temporarily to see if it would have been caught.

rabbbit mentioned this issue Jun 9, 2022

Revert "New atomic-based implementation squeezed into int64" #91

Merged

storozhukBM mentioned this issue Jun 29, 2022

Restore int64 based atomic rate limiter #94

Merged

storozhukBM mentioned this issue Jul 2, 2022

Fix no slack option for int64 based option #95

Merged

rabbbit mentioned this issue Jul 6, 2022

Update example to include a withoutSlack option #96

Merged

rabbbit closed this as completed Oct 31, 2022

rabbbit mentioned this issue Oct 31, 2022

Fix test approach for detecting issues #93

Open

rabbbit mentioned this issue Jul 9, 2023

Update clock dependency #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atomicInt64Limiter WithoutSlack doesn't block #90

atomicInt64Limiter WithoutSlack doesn't block #90

twelsh-aw commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

rabbbit commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

rabbbit commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 27, 2022

storozhukBM commented Jul 2, 2022

rabbbit commented Jul 5, 2022

storozhukBM commented Oct 30, 2022

rabbbit commented Oct 30, 2022

storozhukBM commented Oct 30, 2022

rabbbit commented Oct 31, 2022 •

edited

Loading

storozhukBM commented Oct 31, 2022

rabbbit commented Oct 31, 2022

storozhukBM commented Oct 31, 2022 •

edited

Loading

storozhukBM commented Oct 31, 2022

atomicInt64Limiter WithoutSlack doesn't block #90

atomicInt64Limiter WithoutSlack doesn't block #90

Comments

twelsh-aw commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

rabbbit commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

rabbbit commented Jun 9, 2022

storozhukBM commented Jun 9, 2022

storozhukBM commented Jun 27, 2022

storozhukBM commented Jul 2, 2022

rabbbit commented Jul 5, 2022

storozhukBM commented Oct 30, 2022

rabbbit commented Oct 30, 2022

storozhukBM commented Oct 30, 2022

rabbbit commented Oct 31, 2022 • edited Loading

storozhukBM commented Oct 31, 2022

rabbbit commented Oct 31, 2022

storozhukBM commented Oct 31, 2022 • edited Loading

storozhukBM commented Oct 31, 2022

rabbbit commented Oct 31, 2022 •

edited

Loading

storozhukBM commented Oct 31, 2022 •

edited

Loading