status | title | creation-date | last-updated | authors | |||
---|---|---|---|---|---|---|---|
implemented |
Task-level Resource Requirements |
2022-04-08 |
2022-08-16 |
|
- Summary
- Motivation
- Background
- Goals
- Non-Goals
- Existing Strategies for Controlling Resource Consumption
- Proposal
- Examples
- Example with requests only
- Example with limits only
- Example with both requests and limits
- Example with Sidecar
- Example where LimitRange does not apply
- Example where LimitRange does apply
- Example with StepTemplate
- Example with Step resource requests overridden by TaskRun
- Example with StepOverrides
- Example with both CPU and memory
- Alternatives
- Implementation Pull Requests
- References
Tekton currently provides Step-level configuration for Kubernetes resource requirements via the Task and TaskRun specs. This document proposes allowing users to configure the overall resource requests of Tekton Tasks and TaskRuns.
Kubernetes runs containers within a pod in parallel, so a pod’s effective resource requests and limits are determined by summing the resource requirements of containers. Since Tekton Steps run sequentially, it can be confusing for users to find that the resource requirements of each container are summed (for example, in #4347). This can lead to users requesting pods with more resources than they intended.
Resource requirements may only be specified on containers, not pods, and cannot be updated. A pod’s resource requirements are determined by summing the requests/limits of its app containers (including sidecars) and taking the maximum of that value and the highest value of any init container. If any resource (CPU, memory, etc) has no limit specified, this is considered the highest limit for that resource.
Pod resource requirements are used for scheduling, eviction, and quality of service. Kubernetes will only schedule a pod to a node that has enough resources to accommodate its requests, and will reserve enough system resources to meet the pod’s requests. In addition, if a pod exceeds its memory requests, it may be evicted from the node. Limits are enforced by both the kubelet and container runtime (via cgroups). If a container uses more memory than its limit, it is OOMkilled, and if it exceeds its CPU limit, it is throttled. For more information, see “Resource Management for Pods and Containers”. Resource requirements are also used to determine a pod’s quality of service, which affect how likely it is to be scheduled or evicted.
Resource requirements can't be updated after pods are created.
Tekton Steps correspond to containers, and resource requirements can be specified on a per-Step basis. Step resource requirements can be specified via Task.StepTemplate, Task.Steps, or TaskRun.StepOverrides (increasing order of precedence).
Tekton applies the resource requirements specified by users directly to the containers in the resulting pod, unless there is a LimitRange present in the namespace. Tekton will select pod resource requirements as close to the user’s configuration as possible, subject to the minimum/maximum requirements of any LimitRanges present. TaskRuns are rejected if there is no configuration that meets these constraints.
- Task-level resource requirements are configurable at runtime (i.e. on TaskRun).
- The reasons for runtime configuration are discussed in more detail in TEP-0094.
- Configuration for the amount of resources consumed by an entire PipelineRun, as requested in #4271.
- We could still choose in the future to provide configuration on Pipeline for Task-level resource requirements (e.g. via params).
- Parameterizing resource requirements, as requested in #4080. This would be a valuable addition to Tekton but is out of scope for this proposal.
- Use a Compute Resource Quota to restrict the compute resources available for a namespace. This is a poor workaround, as it’s much easier to determine the amount of resources a single TaskRun will use than the sum of any TaskRuns that can run in a namespace.
- Use a LimitRange to restrict compute resources of any pods in a namespace. This doesn’t address the problem, as the same TaskRun might use very different amounts of resources depending on its inputs. In addition, LimitRanges don’t distinguish between Tekton pods and other pods.
Augment the TaskRun API with a "computeResources" field that allows the user to configure the resource requirements of a Task. An example TaskRun is as follows.
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: image-build-taskrun
spec:
computeResources:
requests:
memory: 1Gi
limits:
memory: 2Gi
This field should also be added to PipelineRun.TaskRunSpecs.
As mentioned in Resource Requirements in Kubernetes, the effective resource requests of a pod are the sum of the resource requests of its containers, and this value is used to determine the resources reserved by the kubelet when scheduling a pod. Therefore, when a user configures a resource request for a TaskRun, any configuration of container requests that sum to the desired request is valid. To simplify interaction with LimitRanges, the desired compute requests should be split among the pod's containers. This is similar to Tekton’s handling of resource requests pre v0.28.0, where the maximum resource request of all containers was applied to only one container, and the rest were left without resource requests.
Because Kubernetes considers containers without resource limits to have higher limits than those with limits configured, configuration for limits is different than configuration for requests. There are several options for how Task-level resource limits could be implemented:
- If the task-level resource limit is applied to only one container, the pod will not have an effective limit due to the other containers without limits. This defeats the purpose of the feature.
- If the task-level limit is spread out among containers, a task where one step is more resource intensive than all the others could get oomkilled or throttled.
- If the task-level limit is applied to each container, the pod has a much higher effective limit than desired.
However, the effective resource limit of a pod are not used for scheduling (see How Pods with resource requests are scheduled and How Kubernetes applies resource requests and limits). Instead, container limits are enforced by the container runtime.
This means that applying the task resource limits to each container in the pod will result in a pod with higher effective limits than desired, but which prevents any individual Step from exceeding configured limits, as is likely desired.
Containers with limits but not requests automatically have their requests set to their limits. This means that if user specifies Task-level limits but not Task-level requests, failing to set the requests would result in a pod with each container's requests set to its limits, resulting in much higher requests than desired. Instead, we will apply behavior similar to Kubernetes' behavior in this case: If a user specifies Task-level limits but not Task-level requests, we will set the Task-level requests to the Task-level limits. (This TEP originally proposed applying the smallest possible resource requests to the container in this case; however, this could make pods more likely to be evicted, since they will very likely exceed the requests reserved for them during scheduling.)
Sidecar containers run in parallel with Steps, meaning that their resource requests and limits should actually be summed with Steps’ resource requirements. In the case of Task-level limits, it is not clear how to distribute the limit between a Sidecar and Steps, since they run at the same time. Therefore, the Task-level resource limit should be interpreted as the limit only for Steps, and Sidecar limits should be set separately. For consistency, Task-level requests should also be interpreted as requests for Steps only. Users should be able to specify both Task-level resource requirements and Sidecar resource requirements.
There are clear reasons to allow compute resources to be configured at runtime, as detailed in TEP-0094. For example, an image build Task may use different amounts of compute resources depending on what image is being built.
The reasons for configuring compute resources at authoring time are less clear. Tasks that set compute resources are less reusable in different environments, and such configuration wouldn't be appropriate for Tasks in the Tekton catalog.
Tekton currently allows users to specify resource requirements at authoring time via Task.Step. This feature exists because Tekton used to embed the Kubernetes container definition in a Step. As part of the future work for this proposal, we may choose to explore deprecating this field. Therefore, it does not make sense to add resource requirements to Task for consistency with resource requirements on Steps.
In addition, adding resource requirements to Tasks implies that Tasks will always be run in a way where this field has meaning. This assumption is not true for situations where multiple Tasks may be run in a pod, such as in TEP-0044.
Because Tekton will handle the logic for the combined resource requests of a TaskRun, users should not be able to specify resource requests for both the TaskRun and individual Steps. This means:
- If a Task defines StepTemplate.Resources or Step.Resources, and the TaskRun defines ComputeResources, the value from the TaskRun will apply and the value from the Task will be ignored.
- The admission webhook should reject TaskRuns that specify both ComputeResources and StepOverrides.Resources. (TaskRuns should be able to define both ComputeResources and SidecarOverrides.Resources, however.)
Users should not be able to mix and match Step resource requirements and TaskRun resource requirements, even for different types of compute resources (e.g. CPU, memory).
Users may have LimitRanges defined in a namespace where Tekton pods are run, which may define minimum or maximum resource requests per pod or container. We already update container resource requirements to comply with namespace LimitRanges, and much of this code should not need to change. If resource requests are “added” to some containers to comply with a minimum request, they should be “subtracted” from the overall total. In addition, if the total resource request would result in a container that has more than the maximum container requests permitted by the limit range, the requests may be spread out between containers. If there is no container configuration that satisfies the LimitRange, the TaskRun will be rejected.
We must ensure that the sum of the requests for each container is still the desired requests for the TaskRun, even after LimitRange defaults have applied. For example, if a user requests 1 CPU for a Task with 2 steps, and a pod is created with one container with 1 CPU and one container without a request, LimitRange default requests will apply to the container without a CPU request, causing the pod to have more CPU than desired. Splitting Task-level resource requests among the pod's containers will prevent this problem.
"Resources" is an extremely overloaded term in Tekton.
Both Task.Resources
and TaskRun.Resources
are currently used to refer to PipelineResources,
while Step.Resources
, StepTemplate.Resources
, and Sidecar.Resources
are used to refer to
compute resources as defined by Kubernetes.
Reusing TaskRun.Resources
will likely cause confusion if PipelineResources haven't yet been
removed. Therefore, the new field will be called "ComputeResources", both to avoid the naming
conflict with PipelineResources and to differentiate between other uses of this word in Tekton.
In an ideal world, we would choose a name that provides consistency with compute resources specified at the Step level. However, if we choose to pursue the future work of deprecating Step-level compute resource requirements, this will no longer be a concern.
- Tekton pods currently have a burstable quality of service class, which will not change as a result of this implementation.
- We should consider updating our catalog Task guidelines with guidance not to use Step resource requirements.
We should consider deprecating Task.Step.Resources
, Task.StepTemplate.Resources
, and TaskRun.StepOverrides
.
Specifying resource requirements for individual Steps is confusing and likely too granular for many CI/CD workflows.
We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types of compute resources (for example, specifying CPU request at the Step level and memory request at the Task level). However, this functionality will not be supported by the initial implementation of this proposal; it can be added later if desired.
Lastly, we can consider adding a Resources
field to Task if there is a clear use case for it.
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
- name: step-3
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-task-run
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 0.5 | N/A |
step-2 | 0.5 | N/A |
step-3 | 0.5 | N/A |
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
- name: step-3
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-task
spec:
taskRef:
name: my-task
computeResources:
limits:
cpu: 3
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 1 | 3 |
step-2 | 1 | 3 |
step-3 | 1 | 3 |
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
- name: step-3
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
limits:
cpu: 2
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 0.5 | 2 |
step-2 | 0.5 | 2 |
step-3 | 0.5 | 2 |
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
sidecars:
- name: sidecar-2
resources:
requests:
cpu: 800m
limits:
cpu: 1
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
The resulting pod would have the following containers:
Step/Sidecar name | CPU request | CPU limit |
---|---|---|
step-1 | 750m | N/A |
step-2 | 750m | N/A |
sidecar-1 | 800m | 1 |
apiVersion: v1
kind: LimitRange
metadata:
name: my-limit-range
spec:
limits:
- max:
cpu: 750m
min:
cpu: 250m
type: Container
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
- name: step-3
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
The resulting pod would have the following containers:
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 500m | 750m |
step-2 | 500m | 750m |
step-3 | 500m | 750m |
(Note that there are a number of possible configurations of CPU requests that satisfy 250m < request < 750m for each container, with a sum of 1.5, and any would be acceptable here.)
apiVersion: v1
kind: LimitRange
metadata:
name: my-limit-range
spec:
limits:
- max:
cpu: 750m
min:
cpu: 600m
type: Container
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
- name: step-3
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
The resulting pod would have the following containers:
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 600m | 750m |
step-2 | 600m | 750m |
step-3 | 600m | 750m |
Here, the LimitRange minimum overrides the specified requests.
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
stepTemplate:
resources:
requests:
cpu: 500m
steps:
- name: step-1
- name: step-2
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
The resulting pod would have the following containers:
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 750m | N/A |
step-2 | 750m | N/A |
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
resources:
requests:
cpu: 500m
- name: step-2
resources:
requests:
cpu: 1
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 2
The resulting pod would have the following containers:
Step name | CPU request | CPU limit |
---|---|---|
step-1 | 1 | N/A |
step-2 | 1 | N/A |
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
stepOverrides:
- name: step-1
resources:
requests:
cpu: 1
computeResources:
requests:
cpu: 1.5
This TaskRun would be rejected.
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: my-task
spec:
steps:
- name: step-1
- name: step-2
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
name: my-taskrun
spec:
taskRef:
name: my-task
computeResources:
requests:
cpu: 1.5
memory: 500Mi
limits:
memory: 1Gi
The resulting pod would have the following containers:
Step name | CPU request | CPU limit | Memory request | Memory limit |
---|---|---|---|---|
step-1 | 750m | N/A | 250Mi | 1Gi |
step-2 | 750m | N/A | 250Mi | 1Gi |
- Request or implement support upstream in Kubernetes for pod-level resource requirements.
- Since Kubernetes runs pod containers in parallel, they have no reason to implement this feature. We will also have less control over the implementation and timeline.
- Support priority classes on Tekton pods to give users more control
over the scheduling of Tekton pods.
- This solution is a worse user experience, as it requires users to think about how their pods should be scheduled in relation to other pods that may be running on a cluster, rather than considering the pod in isolation and letting Kubernetes handle scheduling.
- Run Tekton steps as init containers (which run sequentially).
- Instruct users to apply their resource requests to only one Step.
- This requires users to have a clear understanding of how resource requirements from Steps are applied (which should ideally be an implementation detail), but this is something we should make very clear anyway.
- Apply only the maximum Step resource request and ignore all others,
reverting to pre-0.28.0 behavior.
- This would create confusion and break existing Pipelines.
- [TEP-0104] Support Task-level Resource Requirements for TaskRun: Part #1 Fields Addition & Validation w/ Docs Updates
- [TEP-0104] Populate Task-level Resource Requirements from PipelineRun to TaskRun
- [TEP-0104] Update Pod with Task-level Resource Requirements
- OpenShift guidelines for managing Pipeline resource usage
- Tekton Resource Requests (for how resource requests were handled prior to 0.28.0)
- Tekton LimitRange documentation (for how resource requests are currently handled)