You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem: I upgraded Thanos from v0.16.0 to v0.22.0. Since the upgrade, I noticed the Thanos Rule instances are failing to load all of the rules files.
If you do a systemctl restart, all the rules files are loaded. Then after the next systemctl reload signal, the rules files drop:
The total number of correct rules is the higher value, a little over 3000 rules. It only got that high immediately after a fresh daemon restart. After the next reload (every 1 hour, we sync new rules files from the self-service git repo onto the Thanos Rule machines and reload the daemon), the rules files dropped.
Initially I assumed it was a file descriptor problem and raised it through systemd, but I don't think that's related. The logs don't show any errors:
The text was updated successfully, but these errors were encountered:
sevagh
changed the title
Thanos Rule (0.22.0-rc0) fails to load all rules files after reload - numFiles is dropping
Thanos Rule fails to load all rules files after reload - numFiles is dropping
Jul 14, 2021
Like was suggested in #4432, using curl http://sv3-thanos1:10908/-/reload -X POST works better than the Linux signal for now, and Thanos successfully reloads all of the files.
Thanks for testing and the detailed report! This bug appeared somewhere around the 0.19 version so you've hit it after upgrading. Let's try to fix it with #4442
Thanos version: 0.22.0-rc.0 downloaded from the GitHub releases: https://github.com/thanos-io/thanos/releases/tag/v0.22.0-rc.0
OS: Debian 10 Buster, AMD64 architecture
The problem: I upgraded Thanos from v0.16.0 to v0.22.0. Since the upgrade, I noticed the Thanos Rule instances are failing to load all of the rules files.
If you do a
systemctl restart
, all the rules files are loaded. Then after the nextsystemctl reload
signal, the rules files drop:The total number of correct rules is the higher value, a little over 3000 rules. It only got that high immediately after a fresh daemon restart. After the next reload (every 1 hour, we sync new rules files from the self-service git repo onto the Thanos Rule machines and reload the daemon), the rules files dropped.
Initially I assumed it was a file descriptor problem and raised it through systemd, but I don't think that's related. The logs don't show any errors:
From the logs, we can numFiles is dropping ever since upgrading to 0.22.0-rc0 and sending a reload. The correct numFiles is 527:
Those are historical logs from the old version. From the upgrade to 0.22.0, we can see numFiles dropping below 527 after reloads:
The text was updated successfully, but these errors were encountered: