store: disk usage continually increasing over time #7473

pgier · 2024-06-20T15:38:03Z

Thanos, Prometheus and Golang version used:
Thanos version: quay.io/thanos/thanos:v0.35.1
Golang: whatever version was used to build the quay image
Prometheus: quay.io/prometheus/prometheus:v2.47.0

Object Storage Provider: AWS S3

What happened: Disk usage (k8s PV) continually increases, it's up to about 210 GB currently.

What you expected to happen: Disk usage would stabilize at a certain point

How to reproduce it (as minimally and precisely as possible): I'm not sure how to reproduce it other than install thanos with a store gateway and some metrics sources, and then periodically run queries.

Full logs to relevant components:

See attached log file.

Anything else we need to know:

thanos-store.log

pgier · 2024-06-20T15:38:27Z

Possibly related to #7029

mdraijer · 2024-06-21T05:48:36Z

Same here.

We have 9 different Thanos stacks running, with each a storegateway. Most of them have limited disk usage, as you would expect from the phrase "It acts primarily as an API gateway and therefore does not need significant amounts of local disk space", and also "It keeps a small amount of information about all remote blocks on local disk".

However, one of them has constantly increased in used disk space. Also after cleaning up and restarting, it starts filling up again. Have increased the disk to 150Gi now, whereas all other storegateway disks are 20Gi in size and 5%-30% filled.

Recently one of the other storegateway also started to fill up.

Why do some stores have so much local data and others not?

harry671003 · 2024-06-26T18:51:37Z

Each store-gateway downloads the index-header for the blocks its responsible for.

IIRC, Thanos by default doesn't have sharding enabled. Maybe you could try to shard the store-gateways so that not every store-gateway will download all the block index-headers.

I think there are two ways to shard store-gateways:

Sharding using relabelling - https://thanos.io/v0.14/thanos/sharding.md/#relabelling
Sharding using time-based partitioning - https://thanos.io/v0.14/components/store.md/#time-based-partitioning

pgier · 2024-06-27T14:55:09Z

@harry671003 I would think that would cause disk usage at startup, or when the overall number of metrics increases, but I'm seeing a gradual and consistent increase in disk usage, maybe 1-2 GB per week.

mdraijer · 2024-06-30T05:57:38Z

I see the same behaviour: continuous increase, different values for the different stacks we use.

What I realised: since our installation is a relatively new one (about 1-2 months now), the data in S3 is still increasing. That might be a logical enough reason for the increase in disk use of the storegateway. Also I see a pattern every two days, that might relate to the processes in the compactor, mainly the compaction I think.

@pgier is you installation also fairly new, so the amount of data in S3 is still increasing? Or do you perhaps have no retention set in your compactor?

pgier · 2024-07-01T16:41:57Z

@mdraijer Our installation is about 1.5 years old, although we store up to 2 years of data, so the overall data size is still growing. I recently cleared out the data, and it seems that it does immediately download large amounts of data upon startup, so it's possible that there isn't really a bug here, and this is just how much local data Thanos store gateway needs.

I've also now configured time-based partitioning as @harry671003 suggested, and it does seem to be splitting up the data correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: disk usage continually increasing over time #7473

store: disk usage continually increasing over time #7473

pgier commented Jun 20, 2024

pgier commented Jun 20, 2024

mdraijer commented Jun 21, 2024

harry671003 commented Jun 26, 2024

pgier commented Jun 27, 2024

mdraijer commented Jun 30, 2024

pgier commented Jul 1, 2024

store: disk usage continually increasing over time #7473

store: disk usage continually increasing over time #7473

Comments

pgier commented Jun 20, 2024

pgier commented Jun 20, 2024

mdraijer commented Jun 21, 2024

harry671003 commented Jun 26, 2024

pgier commented Jun 27, 2024

mdraijer commented Jun 30, 2024

pgier commented Jul 1, 2024