Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dev overlay to correctly set imagePullPolicy #195

Merged

Conversation

ossareh
Copy link
Contributor

@ossareh ossareh commented Jun 23, 2020

Is this a bug fix or adding new feature?

bug fix, see #192, related to #193

What is this PR about? / Why do we need it?

what is this about?

As per #193 this implements the same change w.r.t setting imagePullPolicy for the dev overlay.

why do we need it?

Kubernetes defaulting to an imagePullPolicy is sane, but for users expecting :latest to deliver the actual :latest container it can be confusing. For users installing development they'll now get the correct image and have an updated imagePullPolicy.

What testing is done?

There are a handful of ways this can be run:

  1. kustomize build deploy/kubernetes/overlay/dev | kubectl apply -f
  2. kubectl kustomize deploy/kubernetes/overlay/dev | kubectl apply -f (variant of 1)
  3. kubectl apply -k deploy/kubernetes/overlay/dev

I have tested that all three produce the same output without actually applying them to my cluster; as per #193 I use helm. I'm doing this to help out the situation reported in #192 in which @nmtulloch27 has been very helpful in testing and reporting errors.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 23, 2020
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 23, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @ossareh. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 23, 2020
@ossareh
Copy link
Contributor Author

ossareh commented Jun 23, 2020

Maintainers; please see this: #192 (comment)

The gist that is linked clearly shows that the chengpan/aws-efs-csi-driver:latest image is not available; a look at dockerhub shows that image hasn't been updated in quite some time: https://hub.docker.com/r/chengpan/aws-efs-csi-driver

Where does this image come from? Why is it set as the development image when you use the dev overlay? I'm updating the PR to pull amazon/aws-efs-csi-driver:latest for now. LMK if I should change it back.

@ossareh ossareh force-pushed the fix_kustomize_dev_overlay branch 3 times, most recently from ea904b7 to 62d661f Compare June 24, 2020 00:16
Utilize the patchesStrategicMerge strategy to apply the image:tag update and
set imagePullPolicy: Always
@wongma7
Copy link
Contributor

wongma7 commented Jun 24, 2020

The old tag was a maintainer's personal dev image. The new (relatively) amazon/aws-efs-csi-driver:latest tag is pushed on every commit to master.

Obligatory warning not to rely on dev overlay/latest tag.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ossareh, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2020
@wongma7
Copy link
Contributor

wongma7 commented Jun 24, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 24, 2020
@ossareh
Copy link
Contributor Author

ossareh commented Jun 24, 2020

@wongma7, thanks for the test go ahead.

Obligatory warning not to rely on dev overlay/latest tag.

strong agree. As someone that currently relies on the latest tag in production, because there hasn't been a release in a while and we require EFS access points, I'd be interested in knowing whether there are ways that we (the community; including me :)) can help in making regular tagged releases occur.

Perhaps here isn't the correct venue to discuss, just point me in the right direction :)

@wongma7
Copy link
Contributor

wongma7 commented Jun 24, 2020

@ossareh there's a bit of discussion here #152. I am aware the frequency of releases is a pain point (we have never made point releases and we've only ever done one pre-release only do one pre-release per release*) and while I can warn against latest I understand we don't give you an alternative. We are close enough to 0.4.0 that I'm not sure if we should release a 0.3.1 or 0.4.0-rc or what.

@wongma7
Copy link
Contributor

wongma7 commented Jun 24, 2020

/test pull-aws-efs-csi-driver-e2e

@k8s-ci-robot k8s-ci-robot merged commit 0f64aaf into kubernetes-sigs:master Jun 24, 2020
@jqmichael
Copy link
Contributor

jqmichael commented Jun 26, 2020

Isn't having imagePullPolicy:Always the same thing as omitting the imagePullPolicy and use :latest tag?

If you would like to always force a pull, you can do one of the following:

  • set the imagePullPolicy of the container to Always.
  • omit the imagePullPolicy and use :latest as the tag for the image to use.

https://kubernetes.io/docs/concepts/containers/images/#updating-images

@ossareh

@ossareh
Copy link
Contributor Author

ossareh commented Jun 26, 2020

@jqmichael; it should be as per my comment over here - but in dealing with #192 I found I was getting a completely different docker SHA until I updated the imagePullPolicy.

When the problem in #192 hit us on Sunday the Image ID was:

Image ID:      docker-pullable://amazon/aws-efs-csi-driver@sha256:43fb05d4544230010ecb8d7619a4d3b6c2c317a73a79925638501cc82754843c

Once I changed imagePullPolicy to "Always" the Image ID changed to:

Image ID:      docker-pullable://amazon/aws-efs-csi-driver@sha256:962619a5deb34e1c4257f2120dd941ab026fc96adde003e27f70b65023af5a07

With the newer SHA all of our issues went away.

As I mention in the comment I linked above, I'm quite surprised by this fact.

[edit]: just to be clear; my first thought when the problem happened was to fetch down the latest image. So I did the following things:

  • reinstalled the helm chart; it's possible this was actually a noop in our infra (long story)
  • killed the pod and let it restart

I figured the second step would definitely fetch the image, but perhaps it doesn't? Perhaps in that case because the daemonset hasn't been updated it just restarts the existing image? So perhaps it was actually the DaemonSet being updated that changed this?

@jqmichael
Copy link
Contributor

jqmichael commented Jul 6, 2020

killed the pod and let it restart
I figured the second step would definitely fetch the image, but perhaps it doesn't? Perhaps in that case because the daemonset hasn't been updated it just restarts the existing image? So perhaps it was actually the DaemonSet being updated that changed this?

I just ran a simple test with the :latest tag without any imagePullPolicy and was able to confirm that deleting the driver pod did fetch a new sha.

What I tested specifically:

  1. Ensure I didn't set any imagePullPolicy.
$ kubectl describe pod $(kubectl get po -l app=efs-csi-node -n kube-system -o jsonpath='{.items[0].metadata.name}') -n kube-system |grep imagePullPolicy
  1. Check the current image tag and SHA.
$ kubectl describe pod $(kubectl get po -l app=efs-csi-node -n kube-system -o jsonpath='{.items[0].metadata.name}') -n kube-system |grep aws-efs-csi-driver
    Image:         705114051414.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-efs-csi-driver:latest
    Image ID:      docker-pullable://705114051414.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-efs-csi-driver@sha256:8ec4ab80e2e42103a0208d03404681b33993b2e16215d7a4e346328498b63a70
  1. Push a new image to the :latest tag in my ECR repo.
The push refers to repository [705114051414.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-efs-csi-driver]
...
latest: digest: sha256:172b4e7f12a8e03af8e0af893dd294f8c6cca1b7951f427aa8bb8968c19a6f5c size: 1369
  1. Deleted the driver pod.

  2. Verify the image SHA has been updated in the new pod.

$ kubectl describe pod $(kubectl get po -l app=efs-csi-node -n kube-system -o jsonpath='{.items[0].metadata.name}') -n kube-system |grep aws-efs-csi-driver
    Image:         705114051414.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-efs-csi-driver:latest
    Image ID:      docker-pullable://705114051414.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-efs-csi-driver@sha256:172b4e7f12a8e03af8e0af893dd294f8c6cca1b7951f427aa8bb8968c19a6f5c

@ossareh

@ossareh
Copy link
Contributor Author

ossareh commented Jul 8, 2020

@jqmichael Thanks for this. This works as I think it should, and as the docs point out. I clearly misdiagnosed the issue when it bit me! I'm going to wait until the conversation in #207 settles and a path is established then I'm happy to contribute changes to the helm chart to revert the pullPolicy change I made + implement whatever else is identified as helpful/necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants