Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestQuerierWithBlocksStorageRunningInSingleBinaryMode #7972

Closed
56quarters opened this issue Apr 25, 2024 · 1 comment · Fixed by #7978
Closed

Flaky TestQuerierWithBlocksStorageRunningInSingleBinaryMode #7972

56quarters opened this issue Apr 25, 2024 · 1 comment · Fixed by #7978

Comments

@56quarters
Copy link
Contributor

2024-04-25T13:41:09.5021552Z 13:39:16 Killed mimir-2
2024-04-25T13:41:09.5022783Z === RUN   TestQuerierWithBlocksStorageRunningInSingleBinaryMode/inmemory_index_cache,_query_sharding_enabled
2024-04-25T13:41:09.5023553Z 13:39:16 Starting memcached
2024-04-25T13:41:09.5024442Z 13:39:16 Starting minio-9000
2024-04-25T13:41:09.5025026Z 13:39:16 Starting consul
2024-04-25T13:41:09.5025655Z 13:39:17 Ports for container: e2e-mimir-test-consul Mapping: map[8500:32832]
2024-04-25T13:41:09.5026340Z 13:39:17 Started consul
2024-04-25T13:41:09.5026971Z 13:39:17 Ports for container: e2e-mimir-test-memcached Mapping: map[11211:32833]
2024-04-25T13:41:09.5027574Z 13:39:17 Started memcached
2024-04-25T13:41:09.5028051Z 13:39:17 consul: ==> Starting Consul agent...
2024-04-25T13:41:09.5028561Z 13:39:17 consul: Version: '1.8.15'
2024-04-25T13:41:09.5029333Z 13:39:17 consul: Node ID: '09507d80-d50d-1d8f-9253-e55337543210'
2024-04-25T13:41:09.5030008Z 13:39:17 consul: Node name: 'consul'
2024-04-25T13:41:09.5030549Z 13:39:17 consul: Datacenter: 'dc1' (Segment: '<all>')
2024-04-25T13:41:09.5031082Z 13:39:17 consul: Server: true (Bootstrap: false)
2024-04-25T13:41:09.5031890Z 13:39:17 consul: Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
2024-04-25T13:41:09.5032602Z 13:39:17 consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
2024-04-25T13:41:09.5033506Z 13:39:17 consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
2024-04-25T13:41:09.5034374Z 13:39:17 consul: ==> Log data will now stream in as it occurs:
2024-04-25T13:41:09.5035164Z 13:39:17 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:32834]
2024-04-25T13:41:09.5035758Z 13:39:17 Started minio-9000
2024-04-25T13:41:09.5036230Z 13:39:19 Starting mimir-2
2024-04-25T13:41:09.5036636Z 13:39:19 Starting mimir-1
2024-04-25T13:41:09.5037328Z 13:39:20 Ports for container: e2e-mimir-test-mimir-1 Mapping: map[8080:32835]
2024-04-25T13:41:09.5037945Z 13:39:20 Started mimir-1
2024-04-25T13:41:09.5038540Z 13:39:20 Ports for container: e2e-mimir-test-mimir-2 Mapping: map[8080:32836]
2024-04-25T13:41:09.5039201Z 13:39:20 Started mimir-2
2024-04-25T13:41:09.5039595Z 13:39:29 Starting compactor
2024-04-25T13:41:09.5040224Z 13:39:30 Ports for container: e2e-mimir-test-compactor Mapping: map[8080:32837]
2024-04-25T13:41:09.5040878Z 13:39:30 Started compactor
2024-04-25T13:41:09.5041293Z     querier_test.go:483: 
2024-04-25T13:41:09.5042042Z         	Error Trace:	/home/runner/work/mimir/mimir/integration/querier_test.go:483
2024-04-25T13:41:09.5042844Z         	Error:      	Received unexpected error:
2024-04-25T13:41:09.5044228Z         	            	unable to find metrics [thanos_store_index_cache_requests_total] with expected values. Last error: <nil>. Last values: [25]
2024-04-25T13:41:09.5045558Z         	Test:       	TestQuerierWithBlocksStorageRunningInSingleBinaryMode/inmemory_index_cache,_query_sharding_enabled
2024-04-25T13:41:09.5046411Z 13:40:07 Killing compactor
2024-04-25T13:41:09.5046815Z 13:40:07 Killing minio-9000
2024-04-25T13:41:09.5047207Z 13:40:07 Killing consul
2024-04-25T13:41:09.5047673Z 13:40:07 Killing mimir-2
2024-04-25T13:41:09.5048055Z 13:40:07 Killing mimir-1
2024-04-25T13:41:09.5048385Z 13:40:07 Killing memcached
2024-04-25T13:41:09.5048859Z 13:40:07 Killed compactor
2024-04-25T13:41:09.5049226Z 13:40:07 Killed consul
2024-04-25T13:41:09.5049705Z 13:40:07 Killed minio-9000
2024-04-25T13:41:09.5050078Z 13:40:07 Killed memcached
2024-04-25T13:41:09.5050423Z 13:40:07 Killed mimir-1
2024-04-25T13:41:09.5050884Z 13:40:07 Killed mimir-2
2024-04-25T13:41:09.5051507Z --- FAIL: TestQuerierWithBlocksStorageRunningInSingleBinaryMode (88.76s)
@56quarters
Copy link
Contributor Author

I'm able to replicate this by adding a time.Sleep() before performing the queries in the test:

diff --git a/integration/querier_test.go b/integration/querier_test.go
index 76ff81ffb..f19feea08 100644
--- a/integration/querier_test.go
+++ b/integration/querier_test.go
@@ -451,6 +451,10 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
 
                        var expectedCacheRequests int
 
+
+                       time.Sleep(5 * time.Second)
+
+
                        // Query back the series (1 only in the storage, 1 only in the ingesters, 1 on both).
                        // For every series in a block we expect
                        // - 1 cache request for expanded postings
@@ -479,6 +483,8 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
 
                        expectedMemcachedOps := 2 * expectedCacheRequests // Same reasoning as for expectedCacheRequests, but this also includes a set for each get
 
+                       t.Log(fmt.Sprintf("expected cache requests: %d", expectedCacheRequests))
+
                        // Check the in-memory index cache metrics (in the store-gateway).
                        require.NoError(t, cluster.WaitSumMetrics(e2e.Equals(float64(expectedCacheRequests)), "thanos_store_index_cache_requests_total"))
                        require.NoError(t, cluster.WaitSumMetrics(e2e.Equals(0), "thanos_store_index_cache_hits_total")) // no cache hit cause the cache was empty

This makes me suspect that slow CI machines cause a block that's expected to be queried from ingesters gets shipped to "object storage" during this test and therefore is queried by store-gateways, leading to incorrect cache request counts.

56quarters added a commit that referenced this issue Apr 25, 2024
This compactor was added in #6779 but isn't actually needed since we're
testing monolithic mode which already includes the compactor component.

Coincidentally, this also seems to fix #7972

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters added a commit that referenced this issue Apr 25, 2024
This compactor was added in #6779 but isn't actually needed since we're
testing monolithic mode which already includes the compactor component.

Coincidentally, this also seems to fix #7972

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant