Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "debug" mode for TaskRuns #2069

Closed
jbarrick-mesosphere opened this issue Feb 18, 2020 · 21 comments
Closed

Support "debug" mode for TaskRuns #2069

jbarrick-mesosphere opened this issue Feb 18, 2020 · 21 comments
Assignees
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@jbarrick-mesosphere
Copy link
Contributor

Sometimes it would it useful to be able to exec into a running task environment to debug it. Currently, I would accomplish this by modifying the command being run with a long sleep or similar before running.

It would be useful to add a TaskRunSpec.DebugMode boolean into the TaskRunSpec (similar to the existing TaskRunSpec.Status) that when set would make the entrypoint pause rather than exit.

Similarly, TaskRunSpec.DebugOnFailure could be introduced which would make the entrypoint pause rather than exit only when the task has failed.

Eventually this would allow useful features to be added to the tkn CLI, such as making tkn pipelinerun logs --follow --attach-on-failure automatically attach to the failed tasks to let the user debug them.

@vdemeester
Copy link
Member

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 19, 2020
@csgitharness
Copy link

One of the usecase that requires debug mode:
If functional tests are running as part of a task and it faces an issue that is not consistently reproducible, it would be really helpful if container is retained in case of test failure and issue can be debugged by logging into the pod.

@waveywaves
Copy link
Member

/assign

@waveywaves
Copy link
Member

Working on this with @nikhil-thomas 💯

@gabemontero
Copy link
Contributor

We have a somewhat analogous thing with OpenShift today. At least for a subset of the function I hear getting discussed.

See https://docs.openshift.com/container-platform/3.11/cli_reference/basic_cli_operations.html#debug

and

https://github.com/openshift/oc/tree/master/pkg/cli/debug

Perhaps some useful reference.

@waveywaves
Copy link
Member

@gabemontero This is something I had my eyes on earlier, I use this a lot in my debugging actually. But how it translates to taskrun debugging would be interesting to see.

@gabemontero
Copy link
Contributor

@gabemontero This is something I had my eyes on earlier, I use this a lot in my debugging actually.

cool deal @waveywaves

But how it translates to taskrun debugging would be interesting to see.

Yeah, especially in the context of some of the live, step debugging discussed on the work group call.

But certainly for say post completation failure scenarios, being able to replicate the pod that tekton creates for task/taskrun but then /bin/bash into the pod vs. actually run it has value. Being able to inspect the file system, etc.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 13, 2020
@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot
Copy link
Collaborator

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 17, 2020
@bobcatfish
Copy link
Collaborator

This is in our roadmap https://github.com/tektoncd/pipeline/blob/master/roadmap.md

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2020
@bobcatfish bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Aug 24, 2020
@bobcatfish
Copy link
Collaborator

Updating this with the design doc that @waveywaves and @nikhil-thomas brought to the API working group in Apr 2020:

Design Document: Debugging in Tekton

@waveywaves
Copy link
Member

waveywaves commented Jan 7, 2021

Hello @bobcatfish, thank you for updating this and for the gentle reminder Nov 2020 Design Document: Debugging in Tekton v2 was created to simplify the ideas wrt the implementation after the level of complexity the previous document carried. After a bit of brainstorming with @vdemeester in Nov 2020, I have created a POC which puts a breakpoin after a failed step in a TaskRun after which the taskrun can be debugged by the user. The POC also contains scripts which the user can use to control the steps. You can check the POC here here. Will take it up in the next pipeline meeting.

@waveywaves
Copy link
Member

Here is a demo showcasing TaskRun Breakpoint on failure JIC
asciicast

@jstrachan
Copy link

it might be less confusing not talk of debugging (which could involve actually language/runtime debuggers being bound into the pod) and instead talk about breakpoints - which would be super useful.

also rather like the comment from @imjasonh here #2331 (comment) it might be better to have a way to define the steps that there is a breakpoint on & what kind (always / on failure / on condition etc). So rather than a global debug boolean, having a debug struct that lets you define an optional breakpoint for all steps; or 0..N breakpoints for named steps of type Always / OnError / OnCondition would be really nice to have.

I often just want to set a breakpoint on 1 step only + don't wanna have to manually continue to it etc

@flokain
Copy link

flokain commented Nov 7, 2023

any update on this one? I know it works for individual taskruns but when trying to use it in a pipelinerun i get

Error from server (BadRequest): error when creating "examples/php-tests-pipelinerun.yml": admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "debug"

@vdemeester
Copy link
Member

See #5184 for PipelineRuns.

@vdemeester
Copy link
Member

Create #7352 for tracking feedback on this issue as it is implementated but being a feature-flag.

I'll close that issue for now, please refer to the issue linked just above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Done
Status: Done
Development

No branches or pull requests

9 participants