-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MemoryCache test failures due to race #72821
Fix MemoryCache test failures due to race #72821
Conversation
Tagging subscribers to this area: @dotnet/area-extensions-caching Issue DetailsFix #45868 MemoryCache entries may have arbitrary size (weight); if the cache has an overall size limit set on it, the cache maintains a sum of the sizes of its entries ("Size") in order to trigger potential compaction and to reject oversize additions. As an implementation decision, presumably for performance, it does not update the Size atomically with updates to the entries in the collection; the Size is made eventually consistent, potentially on another thread. Size is exposed for unit tests only. These tests were assuming Size was immediately consistent, so they were occasionally failing.
Note that at least one test disabled against #33993 did not actually read the size. It would have failed due to some unrelated reason - back in 2020. I'm going to assume it was fixed by another change such as possibly #42355. Thanks @vonzshik for help.
|
src/libraries/Microsoft.Extensions.Caching.Memory/tests/CapacityTests.cs
Show resolved
Hide resolved
@@ -543,7 +543,6 @@ public void SetGetAndRemoveWorksWithObjectKeysWhenDifferentReferences() | |||
} | |||
|
|||
[Fact] | |||
[ActiveIssue("https://github.com/dotnet/runtime/issues/33993")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed in this test to allow it to be re-enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was the test I was referring to above
Note that at least one test disabled against #33993 did not actually read the size. It would have failed due to some unrelated reason - back in 2020. I'm going to assume it was fixed by another change such as possibly #42355.
it passes for me, so not sure what else I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen that #33993 has been reactivated?
Assert.False() Failure
Expected: False
Actual: True
at Microsoft.Extensions.Caching.Memory.MemoryCacheSetAndRemoveTests.GetAndSet_AreThreadSafe_AndUpdatesNeverLeavesNullValues() in /_/src/libraries/Microsoft.Extensions.Caching.Memory/tests/MemoryCacheSetAndRemoveTests.cs:line 588
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I'll re-disable but against a new issue, for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, that's a pure speculation, but from my understanding, what's what might be happening:
- Thread 1 sets a new entry. It happened to find a previous entry under the same key, so it attempts to expire it.
runtime/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs
Lines 129 to 132 in cea17b2
if (coherentState._entries.TryGetValue(entry.Key, out CacheEntry? priorEntry)) | |
{ | |
priorEntry.SetExpired(EvictionReason.Replaced); | |
} |
- While expiring the previous entry, jit is (I think) allowed to rewrite it from
runtime/src/libraries/Microsoft.Extensions.Caching.Memory/src/CacheEntry.cs
Lines 237 to 241 in cea17b2
if (EvictionReason == EvictionReason.None) | |
{ | |
EvictionReason = reason; | |
} | |
_isExpired = true; |
to
var currentEvictionReason = EvictionReason;
_isExpired = true;
if (currentEvictionReason == EvictionReason.None)
{
EvictionReason = reason;
}
- Thread 2 attempts to get an entry from the cache.
runtime/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs
Lines 213 to 218 in cea17b2
if (coherentState._entries.TryGetValue(key, out CacheEntry? tmp)) | |
{ | |
CacheEntry entry = tmp; | |
// Check if expired due to expiration tokens, timers, etc. and if so, remove it. | |
// Allow a stale Replaced value to be returned due to concurrent calls to SetExpired during SetEntry. | |
if (!entry.CheckExpired(utcNow) || entry.EvictionReason == EvictionReason.Replaced) |
While doing so, it first check whether the entry is expired, and if so, the reason for the expiration.
- Now we have all the parts. If thread 1 did manage to set
_isExpired
flag totrue
, but not yetEvictionReason
, then thread 2 might be able to gettrue
fromCheckExpired
, and readEvictionReason.None
fromEvictionReason
. And this will leadresult
beingnull
.
If so, we can test this theory by adding a MemoryBarrier
between EvictionReason
and _isExpired
, like this:
if (EvictionReason == EvictionReason.None)
{
EvictionReason = reason;
}
Interlocked.MemoryBarrier();
_isExpired = true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is plausible @vonzshik thank you. I'll link from the issue..
|
Fix #45868
Fix #33993
Related #50270
MemoryCache entries may have arbitrary size (weight); if the cache has an overall size limit set on it, the cache maintains a sum of the sizes of its entries ("Size") in order to trigger potential compaction and to reject oversize additions. As an implementation decision, presumably for performance, it does not update the Size atomically with updates to the entries in the collection; the Size is made eventually consistent, potentially on another thread. Size is exposed for unit tests only. These tests were assuming Size was immediately consistent, so they were occasionally failing.
Note that at least one test disabled against #33993 did not actually read the size. It would have failed due to some unrelated reason - back in 2020. I'm going to assume it was fixed by another change such as possibly #42355.
Thanks @vonzshik for help.