Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support runtimeClassName in pod templates #1363

Merged

Conversation

impl
Copy link
Contributor

@impl impl commented Sep 29, 2019

Changes

This change adds support for the Kubernetes 1.12+ runtime class feature by adding the runtimeClassName field to pod templates and propagating that to the underlying pod spec.

At Puppet, we're using Tekton in a fully untrusted environment (i.e., letting arbitrary users run containers à la GitHub Actions). We'd like to use gVisor (GKE Sandbox) as an additional layer of security as we approach a GA release of our product.

As a quick demo, here's the output of dmesg | tail -n20 using the legacy runtime and the gVisor runtime:

$ kubectl create -f dmesg.yaml 
taskrun.tekton.dev/dmesg-l8mzl created
$ tkn taskrun logs dmesg-l8mzl
[test] [16123.188072] audit: type=1327 audit(1569736337.381:1603): proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D636F756E74657273
[test] [16138.666823] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[test] [16138.672706] IPv6: ADDRCONF(NETDEV_UP): vethf7b1349a: link is not ready
[test] [16138.672713] IPv6: ADDRCONF(NETDEV_CHANGE): vethf7b1349a: link becomes ready
[test] [16138.672750] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[test] [16138.673023] mybridge: port 11(vethf7b1349a) entered blocking state
[test] [16138.673024] mybridge: port 11(vethf7b1349a) entered disabled state
[test] [16138.673212] device vethf7b1349a entered promiscuous mode
[test] [16138.673232] mybridge: port 11(vethf7b1349a) entered blocking state
[test] [16138.673233] mybridge: port 11(vethf7b1349a) entered forwarding state
[test] [16138.673293] audit: type=1700 audit(1569736352.866:1604): dev=vethf7b1349a prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[test] [16138.681523] audit: type=1300 audit(1569736352.866:1604): arch=c000003e syscall=44 success=yes exit=40 a0=4 a1=c4200a81e0 a2=28 a3=0 items=0 ppid=5173 pid=11202 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bridge" exe="/opt/cni/bin/bridge" subj=kernel key=(null)
[test] [16138.681697] audit: type=1327 audit(1569736352.866:1604): proctitle="/opt/cni/bin/bridge"
[test] [16140.578679] audit: type=1325 audit(1569736354.772:1605): table=nat family=2 entries=192
[test] [16140.578702] audit: type=1300 audit(1569736354.772:1605): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=16ed3a0 items=0 ppid=11202 pid=11303 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[test] [16140.578714] audit: type=1327 audit(1569736354.772:1605): proctitle=2F7573722F7362696E2F69707461626C6573002D74006E6174002D4E00434E492D316239346435666165393737663166303835626262646535002D2D77616974
[test] [16140.590522] audit: type=1325 audit(1569736354.783:1606): table=nat family=2 entries=194
[test] [16140.590893] audit: type=1300 audit(1569736354.783:1606): arch=c000003e syscall=54 success=yes exit=0 a0=4 a1=0 a2=40 a3=1a3cb50 items=0 ppid=11202 pid=11305 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables" exe="/usr/sbin/xtables-multi" subj=kernel key=(null)
[test] [16140.591663] audit: type=1327 audit(1569736354.783:1606): proctitle=2F7573722F7362696E2F69707461626C6573002D74006E6174002D4100434E492D316239346435666165393737663166303835626262646535002D640031302E312E302E302F3136002D6A00414343455054002D6D00636F6D6D656E74002D2D636F6D6D656E74006E616D653A2022726B742E6B756265726E657465732E696F
[test] [16140.598676] audit: type=1325 audit(1569736354.792:1607): table=nat family=2 entries=195

$ kubectl create -f dmesg-gvisor.yaml
taskrun.tekton.dev/dmesg-trgrx created
$ tkn taskrun logs dmesg-trgrx
[test] [   0.000000] Starting gVisor...
[test] [   0.106358] Rewriting operating system in Javascript...
[test] [   0.297406] Checking naughty and nice process list...
[test] [   0.532869] Searching for socket adapter...
[test] [   0.904946] Letting the watchdogs out...
[test] [   1.275319] Granting licence to kill(2)...
[test] [   1.497669] Synthesizing system calls...
[test] [   1.566194] Consulting tar man page...
[test] [   1.591064] Waiting for children...
[test] [   1.982031] Forking spaghetti code...
[test] [   2.442056] Searching for needles in stacks...
[test] [   2.626892] Ready!

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

  • Support the runtimeClassName field in pod templates

@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Sep 29, 2019
@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 29, 2019
@tekton-robot
Copy link
Collaborator

Hi @impl. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 30, 2019
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
/cc @afrittoli @bobcatfish

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2019
@tekton-robot tekton-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 30, 2019
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting feature to have. It is still alpha in k8s - but I suppose it won't hurt.
We cannot really test it E2E - but I expect any error handling will be done by K8s.
One question, what happens if the class is not defined or not available? Is the pod not scheduled until the class becomes available?

@afrittoli
Copy link
Member

/lgtm

@impl
Copy link
Contributor Author

impl commented Sep 30, 2019

Hi! Thanks for the review!

That's an interesting feature to have. It is still alpha in k8s - but I suppose it won't hurt.

It has been promoted to beta as of v1.14, so we should have some reasonable assurance that the API won't change, at least.

We cannot really test it E2E - but I expect any error handling will be done by K8s.

This indeed seems to be the case.

One question, what happens if the class is not defined or not available? Is the pod not scheduled until the class becomes available?

If it isn't defined, the task will simply fail:

status:
  conditions:
  - lastTransitionTime: "2019-09-30T14:53:43Z"
    message: 'Invalid TaskSpec: pods "dmesg-rt2zd-pod-4889a0" is forbidden: pod rejected:
      RuntimeClass "foo" not found'
    reason: CouldntGetTask
    status: "False"
    type: Succeeded
  podName: ""
  startTime: "2019-09-30T14:53:43Z"

In the case where it isn't available (e.g., the runtime handler exists but can't process the spec), I have seen the pods wait for the runtime. This is usually indicative of a serious system issue though; in my case, the GKE Sandbox nodes we were running somehow lost the process administering the sandbox. This caused basically every pod on the cluster to become effectively unschedulable.

@ghost
Copy link

ghost commented Oct 8, 2019

Looks like this has had enough eyes on it to confirm it can go in!

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2019
This change adds support for the Kubernetes 1.12+ runtime class
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/runtime-class.md)
feature by adding the runtimeClassName field to pod templates and
propagating that to the underlying pod spec.
@impl impl force-pushed the support-pod-runtimeclassname branch from b5f4615 to 422053c Compare October 9, 2019 00:05
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2019
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@impl
Copy link
Contributor Author

impl commented Oct 9, 2019

Hi friends!

@sbwsg and I tried to merge this earlier today but I needed to update my new test against latest master (which I have done above -- just a one-line change). Please let me know if there's anything else needed to get this in!

@ghost
Copy link

ghost commented Oct 9, 2019

/lgtm

@tekton-robot tekton-robot assigned ghost Oct 9, 2019
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2019
@tekton-robot tekton-robot merged commit 770fcec into tektoncd:master Oct 9, 2019
@impl impl deleted the support-pod-runtimeclassname branch October 10, 2019 16:59
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants