You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's an occasional issue with ArgoCD sync can run for 1+ day waiting for a healthy state of a rollout, while the new version pods are crash looping. The rollout is stuck at the first canary step and neither progresses to the next step nor rolls back. The number of ready replicas keeps going up and down by a few, which might be confusing the rollouts controller that some progress is actually happening.
Note: that's a different issue from previously reported rollouts stuck due to "object modified", there's no indication in the logs that that's happening.
To Reproduce
Create a deployment with pods that can run okay. Create a rollout with 100 replicas and a first canary step with 10% of pods to be updated first, no analysis run. Put some analysis run for the 2nd step and beyond, but not sure it's necessary. Get a new pod version that should crash on a startup and try to release it. May need to repeat a few times.
Expected behavior
The rollout should get automatically aborted with pods rolling back to a previous version, app sync fails and app enters a degraded state.
Screenshots
Version
v1.7.1
Logs
From oldest to newest (partial logs)
Enqueueing parent of default/<new-replica-set-name>: Rollout default/<rollout-name>
Patched: {"status":{"availableReplicas":92,"conditions":[{"lastTransitionTime":"2024-08-02T00:25:25Z","lastUpdateTime":"2024-08-02T00:25:25Z","message":"Rollout is paused","reason":"RolloutPaused","status":"False","type":"Paused"},{"lastTransitionTime":"2024-08-02T23:50:30Z","lastUpdateTime":"2024-08-02T23:50:30Z","message":"Rollout is not healthy","reason":"RolloutHealthy","status":"False","type":"Healthy"},{"lastTransitionTime":"2024-08-02T23:50:30Z","lastUpdateTime":"2024-08-02T23:50:30Z","message":"Rollout does not have minimum availability","reason":"AvailableReason","status":"False","type":"Available"},{"lastTransitionTime":"2024-08-02T23:53:21Z","lastUpdateTime":"2024-08-02T23:53:21Z","message":"RolloutCompleted","reason":"RolloutCompleted","status":"False","type":"Completed"},{"lastTransitionTime":"2024-08-02T17:21:27Z","lastUpdateTime":"2024-08-03T01:30:07Z","message":"ReplicaSet \"<new-replica-set-name>\" is progressing.","reason":"ReplicaSetUpdated","status":"True","type":"Progressing"}],"readyReplicas":92}}
Enqueueing parent of default/<new-replica-set-name>: Rollout default/<rollout-name>
Enqueueing parent of default/<new-replica-set-name>: Rollout default/<rollout-name>
Enqueueing parent of default/<new-replica-set-name>: Rollout default/<rollout-name>
Patched: {"status":{"availableReplicas":91,"conditions":[{"lastTransitionTime":"2024-08-02T00:25:25Z","lastUpdateTime":"2024-08-02T00:25:25Z","message":"Rollout is paused","reason":"RolloutPaused","status":"False","type":"Paused"},{"lastTransitionTime":"2024-08-02T23:50:30Z","lastUpdateTime":"2024-08-02T23:50:30Z","message":"Rollout is not healthy","reason":"RolloutHealthy","status":"False","type":"Healthy"},{"lastTransitionTime":"2024-08-02T23:50:30Z","lastUpdateTime":"2024-08-02T23:50:30Z","message":"Rollout does not have minimum availability","reason":"AvailableReason","status":"False","type":"Available"},{"lastTransitionTime":"2024-08-02T23:53:21Z","lastUpdateTime":"2024-08-02T23:53:21Z","message":"RolloutCompleted","reason":"RolloutCompleted","status":"False","type":"Completed"},{"lastTransitionTime":"2024-08-02T17:21:27Z","lastUpdateTime":"2024-08-03T01:31:21Z","message":"ReplicaSet \"<new-replica-set-name>\" is progressing.","reason":"ReplicaSetUpdated","status":"True","type":"Progressing"}],"readyReplicas":91}}
Enqueueing parent of default/<new-replica-set-name>: Rollout default/<rollout-name>
Started syncing Analysis at (2024-08-03 01:32:10.919158096 +0000 UTC m=+220587.756264716)
No status changes. Skipping patch
Started syncing Analysis at (2024-08-03 01:32:10.919769572 +0000 UTC m=+220587.756876202)
No status changes. Skipping patch
Reconciliation completed
Reconciliation completed
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
Checklist:
Describe the bug
There's an occasional issue with ArgoCD sync can run for 1+ day waiting for a healthy state of a rollout, while the new version pods are crash looping. The rollout is stuck at the first canary step and neither progresses to the next step nor rolls back. The number of ready replicas keeps going up and down by a few, which might be confusing the rollouts controller that some progress is actually happening.
Note: that's a different issue from previously reported rollouts stuck due to "object modified", there's no indication in the logs that that's happening.
To Reproduce
Create a deployment with pods that can run okay. Create a rollout with 100 replicas and a first canary step with 10% of pods to be updated first, no analysis run. Put some analysis run for the 2nd step and beyond, but not sure it's necessary. Get a new pod version that should crash on a startup and try to release it. May need to repeat a few times.
Expected behavior
The rollout should get automatically aborted with pods rolling back to a previous version, app sync fails and app enters a degraded state.
Screenshots
Version
v1.7.1
Logs
From oldest to newest (partial logs)
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: