-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactor: Does not exit on error #3966
Comments
Could you please get the goroutine dump via the |
That's not possible as this endpoint is shutdown :( |
Looks like bug, but I would suggest upgrading Thanos so we are sure we know what's happening. Looks like to repro one could try running compactor with AccessDenied (: |
We had KIAM outage, so I think to reproduce is to allow acces -> remove acces. |
Another thing is that we should only turn off the HTTP server at the end of everything to permit debugging via pprof in cases such as this. |
Sounds like the same behaviour I observed in #3868. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Still relevant |
I have observed this a few times and AFAICT this happens because there aren't many cancelation points in the Prometheus compaction code or our downsampling code. So, two improvements can be made here:
Help welcome. |
We've hit this with
After this the process was not doing anything anymore (as described above) and was not respond to SIGTERM either. |
By default, compactor does not crash on halt errors. there is a hidden flag you can change it. https://thanos.io/tip/components/compact.md/#halting i also met same issue and i am still investigating. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
@rudo-thomas , @ianwoolf Hi guys, have you found any solution for that issue (meta.json) ? |
@Venture200 we added a Kubernetes liveness probe. |
@rudo-thomas Thanks for replying, so when compactor crashes, it restarts it again ? |
@Venture200 there is no crash to speak of, see the comments above. Compactor stops responding to liveness probes and gets restarted by Kubernetes. |
level=error ts=2022-11-27T11:08:57.563444234Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="BaseFetcher: iter bucket: context canceled" in my case compactor keep running more than 24hours but after seeing this error, the process is killed. So restarts I believe happens after this error in your case either ? @rudo-thomas |
Thanos, Prometheus and Golang version used:
17.2
Object Storage Provider: s3
What happened: Compactor got an error but did not get killed and does not continue
What you expected to happen: Compactor exits so it can be restarted or continues regardless
How to reproduce it (as minimally and precisely as possible): n/a
Full logs to relevant components:
The text was updated successfully, but these errors were encountered: