Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology management doc updates #17451

Merged

Conversation

lmdaly
Copy link
Contributor

@lmdaly lmdaly commented Nov 6, 2019

Current docs will be updated to provide clarifications on usage.
How to extend a device plugin will be added for device plugin authors to leverage topology manager

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Nov 6, 2019
@lmdaly
Copy link
Contributor Author

lmdaly commented Nov 6, 2019

/milestone 1.17

@k8s-ci-robot
Copy link
Contributor

@lmdaly: You must be a member of the kubernetes/website-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Website milestone maintainers and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone 1.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8sio-netlify-preview-bot
Copy link
Collaborator

k8sio-netlify-preview-bot commented Nov 6, 2019

Deploy preview for kubernetes-io-vnext-staging processing.

Building with commit 23981fc

https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/5dd7ac7ca413f50009f6fab9

@makoscafee
Copy link
Contributor

/milestone 1.17

@k8s-ci-robot k8s-ci-robot added this to the 1.17 milestone Nov 6, 2019
@sftim
Copy link
Contributor

sftim commented Nov 8, 2019

@lmdaly is this related to a code change PR and / or KEP?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 8, 2019
@lmdaly lmdaly force-pushed the topology-management-doc-updates branch from c0ef48b to 8e3095e Compare November 8, 2019 10:20
@lmdaly
Copy link
Contributor Author

lmdaly commented Nov 8, 2019

@sftim it's updates for the topology manager which has an issue here: kubernetes/enhancements#693
and more specifically the following issue is for documentation: kubernetes/kubernetes#83482

This documentation is more updating the existing knowledge base to give users more information on the feature.

@sftim
Copy link
Contributor

sftim commented Nov 9, 2019

Could this PR target the master branch? SIG Docs / this repo works with a continuous release process and has PRs targeting master by default.

Topology Manager would consider this Pod. The Topology Manager consults the Device Manager to discover the topology of the available devices for example.com/deviceA and example.com/deviceB.

As above Topology Manager will use this information to store the best Topology for this container. Device Manager will then use this when assigning devices to the Pod.

{{% /capture %}}
Copy link
Contributor

@klueska klueska Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a section about known issues / limitations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the ones I could remember off hand, let me know what others I have missed

@lmdaly lmdaly force-pushed the topology-management-doc-updates branch from 8e3095e to 57be489 Compare November 13, 2019 13:26
@@ -95,6 +99,8 @@ If it is, Topology Manager will store this and the *Hint Providers* can then use
resource allocation decision.
If, however, this is not possible then the Topology Manager will reject the pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure.

Once the pod is in a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It is recommended a Deployment with Replicas to trigger a redeploy of the pod.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affinity` error.


### Known Limitations
1. As of K8s 1.16 the Topology Manager is currently only guaranteed to work if a *single* container in the pod spec requires aligned resources. This is due to the hint generation being based on current resource allocations, and all containers in a pod generate hints before any resource allocation has been made. This results in unreliable hints for all but the first container in a pod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler is not topology-aware, so it's possible to pass the scheduler and fail the Admit() check in kubelet. If a higher-level controller is used (replicaset, for example) it can repeatedly re-create the pod and have it schedule and fail the same way.

If multiple pods/containers are considered by kubelet in close succession, they can result in the topology manager policy being effectively ignored. See kubernetes/kubernetes#84749

@daminisatya
Copy link
Contributor

@lmdaly Just a reminder about the last Docs deadline - 22nd Nov, by which this PR needs to be merged!

You have some review comments to be addressed

@lmdaly lmdaly force-pushed the topology-management-doc-updates branch from 4464606 to 3462262 Compare November 20, 2019 13:58
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2019
@lmdaly lmdaly force-pushed the topology-management-doc-updates branch from 3462262 to 8e30906 Compare November 20, 2019 16:04
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2019
@lmdaly
Copy link
Contributor Author

lmdaly commented Nov 21, 2019

@daminisatya I have addressed the review comments, do I need a lgtm from anyone in particular?

* Added information on how device plugins can take advantage
of Topology Manager
* Updated the Topology Manager documentation to include additionalinformation and update some out of date sections
@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 22, 2019
@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

/assign @kbarnard10

@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

/cc @kubernetes/sig-docs-en-owners

@k8s-ci-robot k8s-ci-robot requested a review from a team November 22, 2019 04:03
@@ -205,5 +231,7 @@ Here are some examples of device plugin implementations:
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device plugins
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/) on a node
* Read about using [hardware acceleration for TLS ingress](https://kubernetes.io/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/) with Kubernetes
* Learn about [The Topology Manager] (/docs/tasks/adminster-cluster/topology-manager.md)
>>>>>>> Update Topology Manager docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a line from a merge conflict resolution here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, thanks!
Removed line.

@lmdaly lmdaly force-pushed the topology-management-doc-updates branch from 8e30906 to 23981fc Compare November 22, 2019 09:38
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 22, 2019
@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 22, 2019
@daminisatya
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: daminisatya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2019
@k8s-ci-robot k8s-ci-robot merged commit 55227ad into kubernetes:dev-1.17 Nov 22, 2019
@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

@lmdaly does a separate PR need to be opened to make sure these changes make it into master as well?

@sftim
Copy link
Contributor

sftim commented Nov 22, 2019

These changes will land in master once v1.17 is released, a few weeks from now. As part of the release, SIG Docs's nominated lead merges the website dev-1.17 branch into master.

@klueska
Copy link
Contributor

klueska commented Nov 22, 2019

Perfect. Thanks.

mrbobbytables pushed a commit that referenced this pull request Dec 6, 2019
* Added information on how device plugins can take advantage
of Topology Manager
* Updated the Topology Manager documentation to include additionalinformation and update some out of date sections
k8s-ci-robot pushed a commit that referenced this pull request Dec 10, 2019
* feat: graduate TaintNodesByCondition to GA (#17073)

* Promote StartupProbe to beta (enabled by default). (#17164)

* Watch bookmarks to GA (#17026)

* feat: graduate ScheduleDaemonSetPods to GA (#17350)

* Update Docker installation instructions (#17405)

* Use exact version numbers for installing Docker in Ubuntu (#17428)

* Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478)

* Promote NodeLease feature to GA (#17189)

* Update docs for csi topology ga (#17408)

* Update RunAsUsername to beta (#17460)

* doc:Update RunAsUsername to beta

* doc: update samples - kubernetes.io/os is no longer beta

* Updating based on review feedback

* Promote Node-specific volume limits to GA (#17432)

* Promote PodShareProcessNamespace to stable (#17192)

* Promote PodShareProcessNamespace to stable

* Add for_k8s_version to feature-state label

Co-Authored-By: Tim Bannister <tim@scalefactory.com>

* Readd version-check to shareProcessNamespace task

* Update service load balancer finalizer doc for GA (#17438)

* Update Topology Manager docs (#17451)

* Added information on how device plugins can take advantage
of Topology Manager
* Updated the Topology Manager documentation to include additionalinformation and update some out of date sections

* Fix broken Topology Manager link (#17746)

Part of What's Next Device Plugin section

* Update CRD defaulting docs for GA (#17450)

* Add documentation for VolumeSnapshot Beta (#17233)

* Updating EndpointSlice documentation for beta release in 1.17 (#17411)

* (docs/dualstack): v1.17 updates (#17457)

* Add placehold doc updates for dualstack in 1.17

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Add Downward API and /etc/hosts Pod IP validation

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* remove addressed known issue via k/k pr 85246

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Remove known issue and add flag as part of k/k 79993

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* remove follow up placeholders

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Update verbiage

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Make IP addressing consistent throughout the task

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Update to status.podIPs

Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>

* Update content/en/docs/tasks/network/validate-dual-stack.md

Use set instead of env

Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com>

* add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498)

Signed-off-by: Andrew Sy Kim <kiman@vmware.com>

* Service topology alpha documentation (#17459)

* Update list of feature flags for in-tree plugins migrated to CSI (#17533)

Signed-off-by: Deep Debroy <ddebroy@docker.com>

* Update Node concept for TaintNodesByCondition going GA (#17577)

* feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554)

* kubeadm: update the upgrade documentation for 1.17 (#17587)

* doc: Simplify Windows deployments with RuntimeClass (#16697)

* doc: Simplify Windows deployments with RuntimeClass

* Updating on review feedback

* doc: Adding windows-build label from enhancement 1301

* update doc for kubelet option --reserved-cpus (#17648)

* feat: update TaintNodesByCondition in feature gates table (#17377)

* Update docs for v1 resource quota configuration (#17547)

* AdmissionConfiguration v1 (#17548)

* Update WebhookAdmissionConfiguration examples (#17549)

* Update AWS EBS Migration Feature state (#16126)

* Add resource version section to api-concepts documentation (#16910)

* Add Resource Version semantics section to api concepts

* Clarify risks of going back in time, add details about compaction and watch cache sizes

* Apply suggestions from liggitt

Co-Authored-By: Jordan Liggitt <jordan@liggitt.net>

* remove pesudocode, apply feedback

* Fix typo

* Clarify equality rules

* Cleanup kubectl generators docs (#17609)

* Write ReplicationController without a space

* Drop mentioning unsupported cluster versions

* Fix capitalization for “API group”

* Tweak wording

* Avoid using deprecated generator in example

* add Antrea description in dev-1.17 (#17919)

* Promote VolumeSubpathEnvExpansion to GA

* Reference Documentation for the Kubernetes API for 1.17 (#18019)

* Update feature-gates.md (#18033)

* Reference Documentation for kubectl Commands for 1.17 (#18017)

* Update for v1.17 (#18034)

* Update config.toml(release-1.17) for 1.17 (#18031)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants