Configuration for 0 downtime deployments #660

benwilson512 · 2018-10-04T14:15:05Z

Hey folks, I'm having a fair bit of trouble getting 0 downtime deployments to work. The issue:

After a new pod is passes its readiness check, the ALB target group places the Pod's IP in an "initial" state which can last for a couple of seconds.

However since the new pod is ready as far as K8s is concerned, it begins to terminate an old pod, which immediately enters a "draining" state on the target group. At this point there are no pods available to answer requests.

To some extent this can be handled by simply increasing the number of pods. This isn't really a solution though, it just lowers the probability that the rolling deployment outpaces the ALB's ability to keep up. If the AWS API were to undergo any kind of delay or outage, the deployment could complete without any live pods actually registered in the target group.

Is there any known way to require a pod show up as "healthy" in the target group before K8s considers it alive?

M00nF1sh · 2018-10-04T17:50:15Z

Hi Ben,

As far as i know, there is no good solution this under "mode ip" other than increasing the number of pods. The root cause is the ingress resources is only aware of service, and it's not aware of pods

However, to achieve 0 downtime deployment, you can use "mode instance" 😸

benwilson512 · 2018-10-04T18:22:40Z

Hey @M00nF1sh thanks for the pointer! I can confirm that on my end it does indeed look like using the instance mode helps. Would a PR be welcomed to elaborate on the significance of the mode in the docs?

M00nF1sh · 2018-10-04T18:40:23Z

sure, PR are welcome 😸 .

talked with @bigkraig , mode ip might also support 0 downtime in the future, but we don't know how to achieve that. maybe needs k8s core change

benwilson512 · 2018-10-04T18:59:24Z

Right, it would seem that to do 0 downtime with the ip mode you'd need some sort of "availableProbe" that worked distinct from readinessProbe, which would check that the pod was healthy and reachable from the target group.

Anyway, I'll look into making a PR to note that instance is required for 0 downtime, thank you!

BrianChristie · 2019-03-29T17:02:18Z

Could Pod Readiness Gates help here?

It introduces ReadinessGate on Pods, which looks for a matching item in status.conditions.

At first glance, it looks like a custom PodCondition like amazonaws.com/alb-pod-ip-healthy could be introduced, set to Ready by the ingress controller once the PodIP has gone Healthy in the ALB TargetGroup.

PodDisruptionBudget could then be used to pause a Deployment rolling update until a sufficient number of new Pods are Healthy.

It seems this would solve the issue of Pods being terminated before they are Healthy in the ALB.

Edit: Actually I think that a normal Deployment rolling update should suffice, with .spec.strategy.rollingUpdate.MaxUnavailable. No PodDisruptionBudget required.

/CC: @bigkraig @M00nF1sh for discussion

benwilson512 closed this as completed Oct 4, 2018

BrianChristie mentioned this issue Apr 1, 2019

Support Custom Pod Status for PodReadinessGate to Block Premature Pod Termination #905

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration for 0 downtime deployments #660

Configuration for 0 downtime deployments #660

benwilson512 commented Oct 4, 2018 •

edited

Loading

M00nF1sh commented Oct 4, 2018

benwilson512 commented Oct 4, 2018

M00nF1sh commented Oct 4, 2018

benwilson512 commented Oct 4, 2018

BrianChristie commented Mar 29, 2019 •

edited

Loading

Configuration for 0 downtime deployments #660

Configuration for 0 downtime deployments #660

Comments

benwilson512 commented Oct 4, 2018 • edited Loading

M00nF1sh commented Oct 4, 2018

benwilson512 commented Oct 4, 2018

M00nF1sh commented Oct 4, 2018

benwilson512 commented Oct 4, 2018

BrianChristie commented Mar 29, 2019 • edited Loading

benwilson512 commented Oct 4, 2018 •

edited

Loading

BrianChristie commented Mar 29, 2019 •

edited

Loading