-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos sidecar: internal server shutdown after changes within a config file (via a configmap) #2496
Comments
I can confirm we're experiencing this issue with thanos |
Hey @hawran, it looks like when a configuration change happens while sidecar in a reload cycle it crashes. How frequently have you been observing this? Have you observed this with any of the previous versions? In the meanwhile, I will investigate more. |
Yes.
I can trace back similar problems with version
Thank you. |
Hello, We are expecting the same errors from time to time.
After quick investigation, I think we had occurence before the 0.12.x release. to be confirmed. More info (openshift client + server version) where we succeed to reproduce and from where the configmap is changed (as json file --- oc apply -f configmap.json) Containing prometheus alert rules.
|
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Guys, any progress on this matter? |
@hawran I don't think anyone hasn't had a chance to look into it. I'll try my best to check it out, but no promises. So help wanted. (Also added the label) |
OK, thank you for update. |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Seeing this quite often too. I have a test at https://github.com/ebabani/thanos/blob/eb/lstat-error/pkg/reloader/reloader_test.go#L261 to reproduce this issue. Seems timing dependent but I get an error ~50% of the time. Definitely related to how k8s updates the volume on configmap changes. |
Repro test is super helpful thanks! Feel free to propose a fix, otherwise we will take a look soon (: |
@bwplotka What do you think of logging the error and increase the watchErrors metric if the error cause is https://github.com/thanos-io/thanos/blob/master/pkg/reloader/reloader.go#L221-L223 |
@ebabani It could be a good strategy. Especially this part is not the main functionality of the sidecar. @bwplotka already mentioned it in the comments.
@ebabani Are you actively working on it? It'd be awesome to have this fix in the upcoming release in two weeks. |
@kakkoyun I'll send a PR today. |
Fixed by #2996 |
Thanos, Prometheus and Golang version used:
thanos: 0.12.0
prometheus: 2.17.1
go: 1.13
NB I'm aware of the https://github.com/thanos-io/thanos/releases/tag/v0.12.1 release, however regarding fixes within that release I presume this issue is still here.
Object Storage Provider:
ceph s3
What happened:
A thanos sidecar gets restarted from time to time when a config file has been changed (as a reloaded configmap).
What you expected to happen:
No restarts after such changes.
How to reproduce it (as minimally and precisely as possible):
Run a pod with both prometheus and thanos sidecar containers and update a config file (a configmap).
Full logs to relevant components:
RESTARTS
OK
Anything else we need to know:
sidecar's config snippets:
Mounts:
Volumes:
flags:
Environment:
Kernel Version: 5.3.0-24-generic
OS Image: Ubuntu 18.04.2 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.3
Kubelet Version: v1.14.8
Kube-Proxy Version: v1.14.8
The text was updated successfully, but these errors were encountered: