Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-Trim Trailing Whitespace of Tekton Result. #223

Closed
wants to merge 1 commit into from
Closed

TEP-Trim Trailing Whitespace of Tekton Result. #223

wants to merge 1 commit into from

Conversation

XinruZhang
Copy link
Member

This TEP is for issue #3146 originated from a bug of kubeflow/kfp-tekton, aims to strip the EOF new line and all the unwanted trailing whitespace.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign pratap0007
You can assign the PR to them by writing /assign @pratap0007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 2, 2020
teps/tep-trim-tekton-results.md Outdated Show resolved Hide resolved
teps/tep-trim-tekton-results.md Outdated Show resolved Hide resolved
@popcor255
Copy link
Member

This looks good to me.
Just a quick dumb question: Does this remove a trailing linebreaks as well?
Also, I think you need to squash your commits.

… a [bug](kubeflow/kfp-tekton#273) of kubeflow/kfp-tekton, aims to strip the EOF new line and all the unwanted trailing whitespace.
@XinruZhang
Copy link
Member Author

This looks good to me.
Just a quick dumb question: Does this remove a trailing linebreaks as well?
Also, I think you need to squash your commits.

Hi @popcor255 ,
Good question! The answer is yes, this will also remove Tab as well.
Thanks for the comment, the commits are merged.

@vdemeester vdemeester added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Oct 6, 2020
@bobcatfish
Copy link
Contributor

Thanks for writing this up @XinruZhang !

I am actually against adding this feature, because I feel we should try to keep our API as simple as possible (e.g. see design principles on simplicity)

One alternative that I'd like to see represented in the alternatives section is not implementing this - i.e. what does it look like to have to trim the whitespace yourself?

To go from the example in the TEP:

        steps:
        - name: main
          image: alpine:3.12
          script: |
            echo "$(params.project_name)" > "$(results.project.path)"

To a version that will not contain the trailing whitespace is a matter of using a different command, e.g. printf:

        steps:
        - name: main
          image: alpine:3.12
          script: |
            printf "$(params.project_name)" > "$(results.project.path)"

To see this in action on the command line:

# echo will add a newline
~ echo "foo" > bar
~ cat bar
foo

# printf will not
printf "foo" > bar
~ cat bar
foo% 

Basically I think we should only add Tekton level features when it is impossible or extremely cumbersome to have the functionality without Tekton supporting it, and that doesn't seem to be the case here. Plus this opens the floodgates (as mentioned in the TEP) of folks asking for other trimming support, or other features (capitalization, replacement).

I totally understand that this gotcha is frustrating and have run into it myself a few times - however adding a bool that does the trimming will not save you from this gotcha as you will still need to be aware of when to set it, so I'm not convinced that this feature will is necessary.

@bobcatfish
Copy link
Contributor

But you can probably change my mind pretty easily here! I just need to understand what it's important for this to be a feature tekton pipelines provides and it's not enough to ask folks to do the trimming when they need it within their steps (or in the echo/printf case, not so much "trim" as "not output a newline at all when not needed")

@XinruZhang
Copy link
Member Author

Basically I think we should only add Tekton level features when it is impossible or extremely cumbersome to have the functionality without Tekton supporting it, and that doesn't seem to be the case here. Plus this opens the floodgates (as mentioned in the TEP) of folks asking for other trimming support, or other features (capitalization, replacement).

@bobcatfish Thanks for your question! Your concern makes a lot sense. Here is the reason for adding a new field.

Assigning a value to a result is usually by redirecting the output stream of a command to the result file. There are lots of commands whose default output stream is stdout. In this case, to give users a more friendly reading experience, the commands usually add a line break at the end of its output, for example: the issue #182 and retrieving the version of a binary -- python -V, minikube version etc. Those commands have no "no-line-break" versions. Therefore, users have to trim the new line by themselves, which could be tedious compared to setting a boolean field.

As for opening the floodgates of asking other related features support, I guess there would be few users requring replacement and capitalization features, and If they ask for, we can appraise the necessity then.

@bobcatfish
Copy link
Contributor

Thanks for the extra info @XinruZhang !

Before going ahead, can we gather some examples of scenarios where we want to capture the stdout of a command and provide it straight to a result and an extra newline causes a problem?

the issue #182 and retrieving the version of a binary -- python -V, minikube version etc.

Thanks for these examples! In these cases, I feel like it would be reasonable to add an additional step that does whatever formatting is needed.

I'll show you a few examples.

  1. I suggested capturing stdout in Make it possible to extract results from a container's stdout pipeline#2925 to capture output from boskosctl.
    a. boskosctl was actually a bad example b/c the image already contains a shell - I opened Make it possible to extract results from a container's stdout pipeline#2925 to mostly address a case where an image doesnt contain a shell and so it is impossible to capture the output and use it in a subsequent step. This means that if tekton does not provide a way to capture stdout, a Task author needs to create a new image to work around this. <-- this is NOT the case for this proposal, because it IS possible for a Task author to add as many steps to a Task as they need to format results properly
    b. let's look at the boskos example anyway:

     - name: boskosctl
       image: gcr.io/k8s-staging-boskos/boskosctl
       args:...
     - name: grab-project-id
       image: ubuntu
       script: |
         echo $(steps.boskosctl.stdout) | jq -r ".name" > /tekton/results/project-id

    Notice that the jq command needs to be invoked to parse out the project-id b/c the output is actually json and contains more than just the project ID that we want for the result. <-- what I'm trying to say is that when you capture stdout and want to use it in a result, i think the norm will be that some kind of extra formatting will be required, and if adding an extra step allows Task authors the flexibility to format this however needed, I say we keep the Tekton part as simple as possible and don't try to reinvent this

  2. python -V, minikube version <-- I'd suggest handling these examples with an extra step as well, e.g. (imagining you can capture stdout as per TEP-0011: Redirecting Step Output Streams #182)

     - name: python-version
       image: python
       args:
       - python
       - "-V"
      stderr: # surprisingly this output goes to stderr
         path: x/y/z
     - name: format-result
       image: bash # This image is 13 MB, maybe you could find an even smaller image with the shell you need (and this pales in comparison to the size of that python image...)
       script: |
         # tr isn't the only way to do this but I think it'll work
         cat x/y/z >  tr -d '\n' > /tekton/results/python-version

    (apparently python -V prints to stderr?? whodathunk! i guess capturing stderr for TEP-0011: Redirecting Step Output Streams #182 is the way to go!)

So basically TL;DR I think it's reasonable to have Task authors add extra steps to do whatever formatting they need vs. doing this for them.

@XinruZhang
Copy link
Member Author

Hi @bobcatfish ,

Thanks so much for your comment, It’s very important to make this clear. In fact, I'm quite shifty about this problem. After doing some more research, I'm currently inclined to add a new boolean field to tackle the unwanted line break other than leaving it to authors.

I believe keeping Tekton simple is very important. My concern is the current version is somehow not very user-friendly. Here are some detailed reasons:

  • Introducing a line break in tail is very common: By checking the usages in tektoncd/pipeline examples and kfp-tekton, I found lots of commands introduce a line break in the tail, including ls,echo(most common one), python -c script using print() function which is an interesting one that brings a lot of possibilities, and also the issue TEP-0011: Redirecting Step Output Streams #182. Besides, Tekton is an open framework that can potentially support many other pipeline-like applications as long as they contain quite a number of reusable components. Therefore, the possible commands pool is large.
  • It's inconvenient for authors to tackle the line break problem
    • Not sure about whether the command would bring a line break: Sometimes you can't make sure whether the command produce a line break in tail. To avoid making mistakes, you need firstly figure this out and then write shell scripts or another step to handle it.
    • Trimming a line break is not handy: you need to write relatively complex shell scripts, while setting a field as true requires much less prerequisite knowledge.
    • Repeatedly trim line breaks, since the extra new line is common, authors need to trim the new line many times by themselves, which violates the DRY principle.
  • The effect to the "Simple" principle is limited: the field is set as optional, the beginners mainly focus on required fields. When they encounter the line break problem, it would be a surprise for them to learn that there exists such an easy way to solve it.

I'm inclined to believe that adding a new field is a better way to tackle this problem based on the analysis I did. In the meantime, I'm not very firmly because I'm new to Tekton, so I guess whether to carry forward this TEP is up to you~

@bobcatfish
Copy link
Contributor

Thanks for the detailed explanation @XinruZhang and for finding those examples!

I think the points you make make sense, however I think you could also make a similar statement about other mutations you might want to do on a result, e.g. filtering, extracting a value from a json object, etc. In all of these cases a user would need to understand what the output is of the command they are running vs. what value they want in the result.

It's interesting that you highlighted "user-friendliness" as the main motivation here - two thoughts:

  1. what "user friendly" means depends a lot on who the users are and what friendly means
  2. we actually did not list "user friendly" as a design principle for tekton

To me (2) means not that we want to make something that is intentionally difficult to use, but that adding features which make some specific action a bit easier (specifically: you don't need to add a line to a script or add a step to remove the newline) is not a priority for the project. I think this is especially true given that we are gearing up toward a v1 release (would be interested in @vdemeester's thoughts on this) - I could see us revisiting this post v1, i.e.: now that we are confident in our v1 release, do we want to expand into additional functionality such as mutating results? Or does it perhaps make sense to introduce some new component / feature to the API which can express these sorts of desired mutations?

The effect to the "Simple" principle is limited: the field is set as optional, the beginners mainly focus on required fields. When they encounter the line break problem, it would be a surprise for them to learn that there exists such an easy way to solve it.

To me what the simple principle means, and how I've been applying it so far to the project is: if we can avoid adding a feature, let's avoid it, until we absolutely have to.

@XinruZhang XinruZhang closed this Oct 10, 2020
@XinruZhang
Copy link
Member Author

I think this is especially true given that we are gearing up toward a v1 release (would be interested in @vdemeester's thoughts on this) - I could see us revisiting this post v1, i.e.: now that we are confident in our v1 release, do we want to expand into additional functionality such as mutating results? Or does it perhaps make sense to introduce some new component / feature to the API which can express these sorts of desired mutations?

@bobcatfish That's excellent! Thanks for your explanations~ I totally understand now. The function this TEP aims to provide can be tackled by users themselves, there's no need to change Tekton just for refining Task results in the current stage. I guess this PR can be closed then.

BTW, I closed this directly by mistake, what do you think @imjasonh @sbwsg @popcor255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants