Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos Store is not responding #6597

Closed
frakev opened this issue Aug 9, 2023 · 3 comments
Closed

Thanos Store is not responding #6597

frakev opened this issue Aug 9, 2023 · 3 comments

Comments

@frakev
Copy link

frakev commented Aug 9, 2023

Thanos, Prometheus and Golang version used:
Thanos v0.30.0

Object Storage Provider:
s3

What happened:
When I query metrics for the last 7 days, the store seems not responding to the query.
Capture d’écran 2023-08-09 à 10 21 03

What you expected to happen:
After a restart of the Thanos store, it has the data.
Capture d’écran 2023-08-09 à 11 05 24

How to reproduce it (as minimally and precisely as possible):
Difficult to explain how to reproduce 😕
I just launch the components and after severals days it doesn't work.

Full logs to relevant components:
Thanos Store logs stopped on 08/01/2023...

level=debug ts=2023-08-01T11:22:45.648607034Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"toto\" > matchers:<name:\"__name__\" value:\"node_memory_MemTotal_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:52KiB381B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:235.041µs seriesTouched:6 SeriesTouchedSizeSum:366B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:576B chunksFetched:12 ChunksFetchedSizeSum:63KiB136B chunksFetchCount:4 ChunksFetchDurationSum:228.683µs GetAllDuration:3.266523ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:75.781µs}" err=null
level=debug ts=2023-08-01T11:22:45.649459152Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"toto\" > matchers:<name:\"__name__\" value:\"node_memory_MemFree_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:52KiB381B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:232.088µs seriesTouched:6 SeriesTouchedSizeSum:384B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:4KiB35B chunksFetched:12 ChunksFetchedSizeSum:51KiB942B chunksFetchCount:3 ChunksFetchDurationSum:141.127µs GetAllDuration:3.44251ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:31.369µs}" err=null
level=debug ts=2023-08-01T11:22:45.649823337Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"toto\" > matchers:<name:\"__name__\" value:\"node_memory_Cached_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:52KiB381B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:215.113µs seriesTouched:6 SeriesTouchedSizeSum:384B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:3KiB183B chunksFetched:12 ChunksFetchedSizeSum:50KiB871B chunksFetchCount:3 ChunksFetchDurationSum:161.011µs GetAllDuration:3.104147ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:114.957µs}" err=null
level=debug ts=2023-08-01T11:22:45.700130775Z caller=bucket.go:1140 msg="Blocks source resolutions" blocks=4 MaximumResolution=0 mint=1690802400000 maxt=1690888920000 lset="{receive_cluster=\"tutu\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"}" spans="Range: 1690797600005-1690804800000 Resolution: 0"
level=debug ts=2023-08-01T11:22:45.700856544Z caller=bucket.go:1140 msg="Blocks source resolutions" blocks=4 MaximumResolution=0 mint=1690802400000 maxt=1690888920000 lset="{receive_cluster=\"tutu\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"}" spans="Range: 1690797600005-1690804800000 Resolution: 0"
level=debug ts=2023-08-01T11:22:45.701430272Z caller=bucket.go:1140 msg="Blocks source resolutions" blocks=4 MaximumResolution=0 mint=1690802400000 maxt=1690888920000 lset="{receive_cluster=\"tutu\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"}" spans="Range: 1690797600005-1690804800000 Resolution: 0"
level=debug ts=2023-08-01T11:22:45.7019739Z caller=bucket.go:1140 msg="Blocks source resolutions" blocks=4 MaximumResolution=0 mint=1690802400000 maxt=1690888920000 lset="{receive_cluster=\"tutu\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"}" spans="Range: 1690797600005-1690804800000 Resolution: 0"
level=debug ts=2023-08-01T11:22:45.706724131Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_Buffers_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:48KiB671B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:181.181µs seriesTouched:6 SeriesTouchedSizeSum:366B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:576B chunksFetched:12 ChunksFetchedSizeSum:47KiB682B chunksFetchCount:3 ChunksFetchDurationSum:160.29µs GetAllDuration:3.30276ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:3.142513ms}" err=null
level=debug ts=2023-08-01T11:22:45.706957289Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_MemTotal_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:48KiB671B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:168.349µs seriesTouched:6 SeriesTouchedSizeSum:366B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:576B chunksFetched:12 ChunksFetchedSizeSum:47KiB682B chunksFetchCount:3 ChunksFetchDurationSum:123.085µs GetAllDuration:3.60103ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:2.421045ms}" err=null
level=debug ts=2023-08-01T11:22:45.707124489Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_MemFree_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:48KiB671B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:167.984µs seriesTouched:6 SeriesTouchedSizeSum:384B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:3KiB180B chunksFetched:12 ChunksFetchedSizeSum:65KiB742B chunksFetchCount:4 ChunksFetchDurationSum:161.943µs GetAllDuration:4.133538ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:1.493819ms}" err=null
level=debug ts=2023-08-01T11:22:45.707270652Z caller=bucket.go:1290 msg="stats query processed" request="min_time:1690802400000 max_time:1690888920000 matchers:<name:\"hostname\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_Cached_bytes\" > aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:4 postingsTouched:8 PostingsTouchedSizeSum:48KiB671B postingsToFetch:0 postingsFetched:0 PostingsFetchedSizeSum:0B postingsFetchCount:0 PostingsFetchDurationSum:0s cachedPostingsCompressions:0 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:0B CachedPostingsCompressedSizeSum:0B CachedPostingsCompressionTimeSum:0s cachedPostingsDecompressions:8 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:173.24µs seriesTouched:6 SeriesTouchedSizeSum:366B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:12 ChunksTouchedSizeSum:621B chunksFetched:12 ChunksFetchedSizeSum:47KiB745B chunksFetchCount:3 ChunksFetchDurationSum:114.336µs GetAllDuration:4.597961ms mergedSeriesCount:2 mergedChunksCount:4 MergeDuration:632.553µs}" err=null

Thanos Query has the targets:
Capture d’écran 2023-08-09 à 11 22 22

Thanos Store is responding on GRPC port:

nc -v X.X.X.X 20786 Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to X.X.X.X:20786. 
@

Thanos Store is healthy and ready:

curl http://X.X.X.X:31425/-/healthy OK
curl http://X.X.X.X:31425/-/ready OK

Thanos Query logs:

level=debug ts=2023-08-09T08:53:11.586237681Z caller=proxy.go:282 component=proxy request="min_time:1691193300000 max_time:1691278800000 matchers:<name:\"instance\" value:\"toto:9100\" > matchers:<name:\"job\" value:\"node_exporter\" > matchers:<name:\"platform_name\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_HardwareCorrupted_bytes\" > aggregates:COUNT aggregates:SUM step:1200000 " err="No StoreAPIs matched for this query" stores="store Addr: X.X.X.X:30374 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691193300000,1691278800000]. Store time ranges: [1691539200005,9223372036854775807];store Addr: X.X.X.X:26727 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691193300000,1691278800000]. Store time ranges: [1691539200005,9223372036854775807];store Addr: X.X.X.X:20786 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1685522400451 Maxt: 1690804800000 filtered out: does not have data within this time period: [1691193300000,1691278800000]. Store time ranges: [1685522400451,1690804800000];store Addr: X.X.X.X:30945 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691193300000,1691278800000]. Store time ranges: [1691539200005,9223372036854775807]" 
level=debug ts=2023-08-09T08:53:11.586278282Z caller=proxy.go:282 component=proxy request="min_time:1691452500000 max_time:1691538000000 matchers:<name:\"instance\" value:\"toto:9100\" > matchers:<name:\"job\" value:\"node_exporter\" > matchers:<name:\"platform_name\" value:\"titi\" > matchers:<name:\"__name__\" value:\"node_memory_HardwareCorrupted_bytes\" > aggregates:COUNT aggregates:SUM step:1200000 " err="No StoreAPIs matched for this query" stores="store Addr: X.X.X.X:30374 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691452500000,1691538000000]. Store time ranges: [1691539200005,9223372036854775807];store Addr: X.X.X.X:26727 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691452500000,1691538000000]. Store time ranges: [1691539200005,9223372036854775807];store Addr: X.X.X.X:20786 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1685522400451 Maxt: 1690804800000 filtered out: does not have data within this time period: [1691452500000,1691538000000]. Store time ranges: [1685522400451,1690804800000];store Addr: X.X.X.X:30945 LabelSets: {receive_cluster=\"titi\", replica=\"thanos-receive\", tenant_id=\"default-tenant\"} Mint: 1691539200005 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1691452500000,1691538000000]. Store time ranges: [1691539200005,9223372036854775807]"

Anything else we need to know:
I use Nomad (1.6.1) and Consul (1.16.1) to launch Thanos components on different instances and I use bridge mode in Nomad jobs for the ports.
Thanos Receive have a TSDB retention of 4h.

Thanos Store configuration:

  "store",
  "--log.level=debug",
  "--data-dir=/data",
  "--grpc-address=0.0.0.0:10901",
  "--http-address=0.0.0.0:10902",
  "--index-cache-size=5GB",
  "--block-meta-fetch-concurrency=96",
  "--block-sync-concurrency=60",
  "--store.grpc.series-max-concurrency=60",
  "--sync-block-duration=30m",
  "--chunk-pool-size=5GB",
  "--objstore.config-file=/etc/thanos/bucket.yml",
  "--index-cache.config-file=/etc/thanos/cache.conf",
  "--store.caching-bucket.config-file=/etc/thanos/cache-bucket.conf"

Thanos Query configuration:

  "query",
  "--grpc-address=0.0.0.0:10901",
  "--http-address=0.0.0.0:10902",
  "--query.replica-label=platform_number",
  "--store.sd-files=/etc/targets.yml",
  "--web.external-prefix=/thanos/",
  "--web.route-prefix=/thanos/",
  "--query.timeout=7m",
  "--query.promql-engine=prometheus",
  "--store.response-timeout=5m",
  "--query.partial-response",
  "--query.max-concurrent=40",
  "--query.max-concurrent-select=10"

Environment:

  • OS (e.g. from /etc/os-release): CentOS Linux 7
  • Kernel (e.g. uname -a): 3.10.0-1160.49.1.el7.x86_64 # 1 SMP Tue Nov 30 15:51:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Others: Thanos is running in Docker containers
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.40
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:41 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.9
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       9d988398e7
  Built:            Fri May 15 00:24:05 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
@yeya24
Copy link
Contributor

yeya24 commented Aug 9, 2023

Maybe it is the issue of #6086, try upgrading to v0.30.2

@frakev
Copy link
Author

frakev commented Aug 10, 2023

Hello @yeya24,
I will update to v0.30.2 or newer to test.
I'll keep you in touch.
Thank you for your help.

@frakev
Copy link
Author

frakev commented Aug 14, 2023

Hello,
Seems OK after the update 👍
Than you !

@frakev frakev closed this as completed Aug 14, 2023
@frakev frakev closed this as not planned Won't fix, can't repro, duplicate, stale Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants