TEP-0104: Task-level Resource Requirements

Summary
Motivation
Background
- Resource requirements in Kubernetes
- Resource requirements in Tekton
Goals
Non-Goals
Existing Strategies for Controlling Resource Consumption
Proposal
Examples
Alternatives
Implementation Pull Requests
References

Summary

Tekton currently provides Step-level configuration for Kubernetes resource requirements via the Task and TaskRun specs. This document proposes allowing users to configure the overall resource requests of Tekton Tasks and TaskRuns.

Motivation

Kubernetes runs containers within a pod in parallel, so a pod’s effective resource requests and limits are determined by summing the resource requirements of containers. Since Tekton Steps run sequentially, it can be confusing for users to find that the resource requirements of each container are summed (for example, in #4347). This can lead to users requesting pods with more resources than they intended.

Background

Resource requirements in Kubernetes

Resource requirements may only be specified on containers, not pods, and cannot be updated. A pod’s resource requirements are determined by summing the requests/limits of its app containers (including sidecars) and taking the maximum of that value and the highest value of any init container. If any resource (CPU, memory, etc) has no limit specified, this is considered the highest limit for that resource.

Pod resource requirements are used for scheduling, eviction, and quality of service. Kubernetes will only schedule a pod to a node that has enough resources to accommodate its requests, and will reserve enough system resources to meet the pod’s requests. In addition, if a pod exceeds its memory requests, it may be evicted from the node. Limits are enforced by both the kubelet and container runtime (via cgroups). If a container uses more memory than its limit, it is OOMkilled, and if it exceeds its CPU limit, it is throttled. For more information, see “Resource Management for Pods and Containers”. Resource requirements are also used to determine a pod’s quality of service, which affect how likely it is to be scheduled or evicted.

Resource requirements can't be updated after pods are created.

Resource requirements in Tekton

Tekton Steps correspond to containers, and resource requirements can be specified on a per-Step basis. Step resource requirements can be specified via Task.StepTemplate, Task.Steps, or TaskRun.StepOverrides (increasing order of precedence).

Tekton applies the resource requirements specified by users directly to the containers in the resulting pod, unless there is a LimitRange present in the namespace. Tekton will select pod resource requirements as close to the user’s configuration as possible, subject to the minimum/maximum requirements of any LimitRanges present. TaskRuns are rejected if there is no configuration that meets these constraints.

Goals

Task-level resource requirements are configurable at runtime (i.e. on TaskRun).
- The reasons for runtime configuration are discussed in more detail in TEP-0094.

Non-Goals

Configuration for the amount of resources consumed by an entire PipelineRun, as requested in #4271.
- We could still choose in the future to provide configuration on Pipeline for Task-level resource requirements (e.g. via params).
Parameterizing resource requirements, as requested in #4080. This would be a valuable addition to Tekton but is out of scope for this proposal.

Existing Strategies for Controlling Resource Consumption

Use a Compute Resource Quota to restrict the compute resources available for a namespace. This is a poor workaround, as it’s much easier to determine the amount of resources a single TaskRun will use than the sum of any TaskRuns that can run in a namespace.
Use a LimitRange to restrict compute resources of any pods in a namespace. This doesn’t address the problem, as the same TaskRun might use very different amounts of resources depending on its inputs. In addition, LimitRanges don’t distinguish between Tekton pods and other pods.

Proposal

API Changes

Augment the TaskRun API with a "computeResources" field that allows the user to configure the resource requirements of a Task. An example TaskRun is as follows.

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: image-build-taskrun
spec:
  computeResources:
    requests:
      memory: 1Gi
    limits:
      memory: 2Gi

This field should also be added to PipelineRun.TaskRunSpecs.

Applying Task-level Resources to Containers

Requests

As mentioned in Resource Requirements in Kubernetes, the effective resource requests of a pod are the sum of the resource requests of its containers, and this value is used to determine the resources reserved by the kubelet when scheduling a pod. Therefore, when a user configures a resource request for a TaskRun, any configuration of container requests that sum to the desired request is valid. To simplify interaction with LimitRanges, the desired compute requests should be split among the pod's containers. This is similar to Tekton’s handling of resource requests pre v0.28.0, where the maximum resource request of all containers was applied to only one container, and the rest were left without resource requests.

Limits

Because Kubernetes considers containers without resource limits to have higher limits than those with limits configured, configuration for limits is different than configuration for requests. There are several options for how Task-level resource limits could be implemented:

If the task-level resource limit is applied to only one container, the pod will not have an effective limit due to the other containers without limits. This defeats the purpose of the feature.
If the task-level limit is spread out among containers, a task where one step is more resource intensive than all the others could get oomkilled or throttled.
If the task-level limit is applied to each container, the pod has a much higher effective limit than desired.

However, the effective resource limit of a pod are not used for scheduling (see How Pods with resource requests are scheduled and How Kubernetes applies resource requests and limits). Instead, container limits are enforced by the container runtime.

This means that applying the task resource limits to each container in the pod will result in a pod with higher effective limits than desired, but which prevents any individual Step from exceeding configured limits, as is likely desired.

Containers with limits but not requests automatically have their requests set to their limits. This means that if user specifies Task-level limits but not Task-level requests, failing to set the requests would result in a pod with each container's requests set to its limits, resulting in much higher requests than desired. Instead, we will apply behavior similar to Kubernetes' behavior in this case: If a user specifies Task-level limits but not Task-level requests, we will set the Task-level requests to the Task-level limits. (This TEP originally proposed applying the smallest possible resource requests to the container in this case; however, this could make pods more likely to be evicted, since they will very likely exceed the requests reserved for them during scheduling.)

Sidecars

Sidecar containers run in parallel with Steps, meaning that their resource requests and limits should actually be summed with Steps’ resource requirements. In the case of Task-level limits, it is not clear how to distribute the limit between a Sidecar and Steps, since they run at the same time. Therefore, the Task-level resource limit should be interpreted as the limit only for Steps, and Sidecar limits should be set separately. For consistency, Task-level requests should also be interpreted as requests for Steps only. Users should be able to specify both Task-level resource requirements and Sidecar resource requirements.

Authoring Time (Task) vs Runtime (TaskRun) configuration

There are clear reasons to allow compute resources to be configured at runtime, as detailed in TEP-0094. For example, an image build Task may use different amounts of compute resources depending on what image is being built.

The reasons for configuring compute resources at authoring time are less clear. Tasks that set compute resources are less reusable in different environments, and such configuration wouldn't be appropriate for Tasks in the Tekton catalog.

Tekton currently allows users to specify resource requirements at authoring time via Task.Step. This feature exists because Tekton used to embed the Kubernetes container definition in a Step. As part of the future work for this proposal, we may choose to explore deprecating this field. Therefore, it does not make sense to add resource requirements to Task for consistency with resource requirements on Steps.

In addition, adding resource requirements to Tasks implies that Tasks will always be run in a way where this field has meaning. This assumption is not true for situations where multiple Tasks may be run in a pod, such as in TEP-0044.

Interaction with Step resource requirements

Because Tekton will handle the logic for the combined resource requests of a TaskRun, users should not be able to specify resource requests for both the TaskRun and individual Steps. This means:

If a Task defines StepTemplate.Resources or Step.Resources, and the TaskRun defines ComputeResources, the value from the TaskRun will apply and the value from the Task will be ignored.
The admission webhook should reject TaskRuns that specify both ComputeResources and StepOverrides.Resources. (TaskRuns should be able to define both ComputeResources and SidecarOverrides.Resources, however.)

Users should not be able to mix and match Step resource requirements and TaskRun resource requirements, even for different types of compute resources (e.g. CPU, memory).

Interaction with LimitRanges

Users may have LimitRanges defined in a namespace where Tekton pods are run, which may define minimum or maximum resource requests per pod or container. We already update container resource requirements to comply with namespace LimitRanges, and much of this code should not need to change. If resource requests are “added” to some containers to comply with a minimum request, they should be “subtracted” from the overall total. In addition, if the total resource request would result in a container that has more than the maximum container requests permitted by the limit range, the requests may be spread out between containers. If there is no container configuration that satisfies the LimitRange, the TaskRun will be rejected.

We must ensure that the sum of the requests for each container is still the desired requests for the TaskRun, even after LimitRange defaults have applied. For example, if a user requests 1 CPU for a Task with 2 steps, and a pod is created with one container with 1 CPU and one container without a request, LimitRange default requests will apply to the container without a CPU request, causing the pod to have more CPU than desired. Splitting Task-level resource requests among the pod's containers will prevent this problem.

Naming

"Resources" is an extremely overloaded term in Tekton. Both Task.Resources and TaskRun.Resources are currently used to refer to PipelineResources, while Step.Resources, StepTemplate.Resources, and Sidecar.Resources are used to refer to compute resources as defined by Kubernetes.

Reusing TaskRun.Resources will likely cause confusion if PipelineResources haven't yet been removed. Therefore, the new field will be called "ComputeResources", both to avoid the naming conflict with PipelineResources and to differentiate between other uses of this word in Tekton.

In an ideal world, we would choose a name that provides consistency with compute resources specified at the Step level. However, if we choose to pursue the future work of deprecating Step-level compute resource requirements, this will no longer be a concern.

Other Considerations

Tekton pods currently have a burstable quality of service class, which will not change as a result of this implementation.
We should consider updating our catalog Task guidelines with guidance not to use Step resource requirements.

Future Work

We should consider deprecating Task.Step.Resources, Task.StepTemplate.Resources, and TaskRun.StepOverrides. Specifying resource requirements for individual Steps is confusing and likely too granular for many CI/CD workflows.

We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types of compute resources (for example, specifying CPU request at the Step level and memory request at the Task level). However, this functionality will not be supported by the initial implementation of this proposal; it can be added later if desired.

Lastly, we can consider adding a Resources field to Task if there is a clear use case for it.

Examples

Example with requests only

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
    - name: step-3

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-task-run
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5

Step name	CPU request	CPU limit
step-1	0.5	N/A
step-2	0.5	N/A
step-3	0.5	N/A

Example with limits only

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
    - name: step-3

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-task
spec:
  taskRef:
    name: my-task
  computeResources:
    limits:
      cpu: 3

Step name	CPU request	CPU limit
step-1	1	3
step-2	1	3
step-3	1	3

Example with both requests and limits

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
    - name: step-3

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5
    limits:
      cpu: 2

Step name	CPU request	CPU limit
step-1	0.5	2
step-2	0.5	2
step-3	0.5	2

Example with Sidecar

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
  sidecars:
    - name: sidecar-2
      resources:
        requests:
          cpu: 800m
        limits:
          cpu: 1

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5

The resulting pod would have the following containers:

Step/Sidecar name	CPU request	CPU limit
step-1	750m	N/A
step-2	750m	N/A
sidecar-1	800m	1

Example where LimitRange does not apply

apiVersion: v1
kind: LimitRange
metadata:
  name: my-limit-range
spec:
  limits:
    - max:
        cpu: 750m
      min:
        cpu: 250m
      type: Container

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
    - name: step-3

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5

The resulting pod would have the following containers:

Step name	CPU request	CPU limit
step-1	500m	750m
step-2	500m	750m
step-3	500m	750m

(Note that there are a number of possible configurations of CPU requests that satisfy 250m < request < 750m for each container, with a sum of 1.5, and any would be acceptable here.)

Example where LimitRange does apply

apiVersion: v1
kind: LimitRange
metadata:
  name: my-limit-range
spec:
  limits:
    - max:
        cpu: 750m
      min:
        cpu: 600m
      type: Container

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2
    - name: step-3

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5

The resulting pod would have the following containers:

Step name	CPU request	CPU limit
step-1	600m	750m
step-2	600m	750m
step-3	600m	750m

Here, the LimitRange minimum overrides the specified requests.

Example with StepTemplate

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  stepTemplate:
    resources:
      requests:
        cpu: 500m
  steps:
    - name: step-1
    - name: step-2

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5

The resulting pod would have the following containers:

Step name	CPU request	CPU limit
step-1	750m	N/A
step-2	750m	N/A

Example with Step resource requests overridden by TaskRun

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
      resources:
        requests:
         cpu: 500m
    - name: step-2
      resources:
        requests:
         cpu: 1

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 2

The resulting pod would have the following containers:

Step name	CPU request	CPU limit
step-1	1	N/A
step-2	1	N/A

Example with StepOverrides

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  stepOverrides:
    - name: step-1
      resources:
        requests:
          cpu: 1
  computeResources:
    requests:
      cpu: 1.5

This TaskRun would be rejected.

Example with both CPU and memory

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
    - name: step-1
    - name: step-2

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: my-taskrun
spec:
  taskRef:
    name: my-task
  computeResources:
    requests:
      cpu: 1.5
      memory: 500Mi
    limits:
      memory: 1Gi

The resulting pod would have the following containers:

Step name	CPU request	CPU limit	Memory request	Memory limit
step-1	750m	N/A	250Mi	1Gi
step-2	750m	N/A	250Mi	1Gi

Alternatives

Request or implement support upstream in Kubernetes for pod-level resource requirements.
- Since Kubernetes runs pod containers in parallel, they have no reason to implement this feature. We will also have less control over the implementation and timeline.
Support priority classes on Tekton pods to give users more control over the scheduling of Tekton pods.
- This solution is a worse user experience, as it requires users to think about how their pods should be scheduled in relation to other pods that may be running on a cluster, rather than considering the pod in isolation and letting Kubernetes handle scheduling.
Run Tekton steps as init containers (which run sequentially).
- We used to do this, but moved away from this because of poor support for logging and no way to support Task Sidecars (see #224).
Instruct users to apply their resource requests to only one Step.
- This requires users to have a clear understanding of how resource requirements from Steps are applied (which should ideally be an implementation detail), but this is something we should make very clear anyway.
Apply only the maximum Step resource request and ignore all others, reverting to pre-0.28.0 behavior.
- This would create confusion and break existing Pipelines.

Implementation Pull Requests

[TEP-0104] Support Task-level Resource Requirements for TaskRun: Part #1 Fields Addition & Validation w/ Docs Updates
[TEP-0104] Populate Task-level Resource Requirements from PipelineRun to TaskRun
[TEP-0104] Update Pod with Task-level Resource Requirements

References

OpenShift guidelines for managing Pipeline resource usage
Tekton Resource Requests (for how resource requests were handled prior to 0.28.0)
Tekton LimitRange documentation (for how resource requests are currently handled)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0104-tasklevel-resource-requirements.md

0104-tasklevel-resource-requirements.md

TEP-0104: Task-level Resource Requirements

Summary

Motivation

Background

Resource requirements in Kubernetes

Resource requirements in Tekton

Goals

Non-Goals

Existing Strategies for Controlling Resource Consumption

Proposal

API Changes

Applying Task-level Resources to Containers

Requests

Limits

Sidecars

Authoring Time (Task) vs Runtime (TaskRun) configuration

Interaction with Step resource requirements

Interaction with LimitRanges

Naming

Other Considerations

Future Work

Examples

Example with requests only

Example with limits only

Example with both requests and limits

Example with Sidecar

Example where LimitRange does not apply

Example where LimitRange does apply

Example with StepTemplate

Example with Step resource requests overridden by TaskRun

Example with StepOverrides

Example with both CPU and memory

Alternatives

Implementation Pull Requests

References

Files

0104-tasklevel-resource-requirements.md

Latest commit

History

0104-tasklevel-resource-requirements.md

File metadata and controls

TEP-0104: Task-level Resource Requirements

Summary

Motivation

Background

Resource requirements in Kubernetes

Resource requirements in Tekton

Goals

Non-Goals

Existing Strategies for Controlling Resource Consumption

Proposal

API Changes

Applying Task-level Resources to Containers

Requests

Limits

Sidecars

Authoring Time (Task) vs Runtime (TaskRun) configuration

Interaction with Step resource requirements

Interaction with LimitRanges

Naming

Other Considerations

Future Work

Examples

Example with requests only

Example with limits only

Example with both requests and limits

Example with Sidecar

Example where LimitRange does not apply

Example where LimitRange does apply

Example with StepTemplate

Example with Step resource requests overridden by TaskRun

Example with StepOverrides

Example with both CPU and memory

Alternatives

Implementation Pull Requests

References