Skip to content

Commit

Permalink
Fix layout of pod-failure-policy.md
Browse files Browse the repository at this point in the history
  • Loading branch information
windsonsea committed Sep 5, 2022
1 parent 567eabf commit 926d85d
Showing 1 changed file with 26 additions and 18 deletions.
44 changes: 26 additions & 18 deletions content/en/docs/tasks/job/pod-failure-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ kubectl create -f job-pod-failure-policy-failjob.yaml
```

After around 30s the entire Job should be terminated. Inspect the status of the Job by running:

```sh
kubectl get jobs -l job-name=job-pod-failure-policy-failjob -o yaml
```
Expand All @@ -68,9 +69,11 @@ of the Pod, taking at least 2 minutes.
### Clean up

Delete the Job you created:

```sh
kubectl delete jobs/job-pod-failure-policy-failjob
```

The cluster automatically cleans up the Pods.

## Using Pod failure policy to ignore Pod disruptions
Expand All @@ -87,34 +90,37 @@ node while the Pod is running on it (within 90s since the Pod is scheduled).

1. Create a Job based on the config:

{{< codenew file="/controllers/job-pod-failure-policy-ignore.yaml" >}}
{{< codenew file="/controllers/job-pod-failure-policy-ignore.yaml" >}}

by running:
by running:

```sh
kubectl create -f job-pod-failure-policy-ignore.yaml
```
```sh
kubectl create -f job-pod-failure-policy-ignore.yaml
```

2. Run this command to check the `nodeName` the Pod is scheduled to:

```sh
nodeName=$(kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath='{.items[0].spec.nodeName}')
```
```sh
nodeName=$(kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath='{.items[0].spec.nodeName}')
```

3. Drain the node to evict the Pod before it completes (within 90s):
```sh
kubectl drain nodes/$nodeName --ignore-daemonsets --grace-period=0
```

```sh
kubectl drain nodes/$nodeName --ignore-daemonsets --grace-period=0
```

4. Inspect the `.status.failed` to check the counter for the Job is not incremented:
```sh
kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml
```

```sh
kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml
```

5. Uncordon the node:
```sh
kubectl uncordon nodes/$nodeName
```

```sh
kubectl uncordon nodes/$nodeName
```

The Job resumes and succeeds.

Expand All @@ -124,16 +130,18 @@ result in terminating the entire Job (as the `.spec.backoffLimit` is set to 0).
### Cleaning up

Delete the Job you created:

```sh
kubectl delete jobs/job-pod-failure-policy-ignore
```

The cluster automatically cleans up the Pods.

## Alternatives

You could rely solely on the
[Pod backoff failure policy](/docs/concepts/workloads/controllers/job#pod-backoff-failure-policy),
by specifying the Job's `.spec.backoffLimit` field. However, in many situations
it is problematic to find a balance between setting the a low value for `.spec.backoffLimit`
it is problematic to find a balance between setting a low value for `.spec.backoffLimit`
to avoid unnecessary Pod retries, yet high enough to make sure the Job would
not be terminated by Pod disruptions.

0 comments on commit 926d85d

Please sign in to comment.