Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology-aware: internal error from changing containers' NUMA nodes by adjusting AvailableResources #92

Open
askervin opened this issue Jul 10, 2023 · 0 comments

Comments

@askervin
Copy link
Collaborator

Assume that a container runs on CPUs of NUMA node 0.

An admin wants to reorganize server resources so that containers will not use CPUs on NUMA/die/socket 0 anymore by removing those CPUs from AvailableResources.

When this is done, restarting the topology aware NRI plugin with new configuration fails with an internal error:

E0710 07:30:57.289447       1 nri.go:784] <= Synchronize FAILED: failed to start policy topology-aware: topology-aware: failed to start:
topology-aware: failed to restore allocations from cache:
topology-aware: failed to allocate <CPU request pod0/pod0c0: exclusive: 3><Memory request: limit:95.37M, req:95.37M> from <NUMA node #1 allocatable: MemLimit: DRAM 1.85G>:
topology-aware: internal error: NUMA node #1: can't slice 3 exclusive CPUs from , 0m available

Let's discuss if this is a bug, expected behavior or if we should provide a configuration option for forcing new CPU/memory pinning, even if it would lead into costly memory accesses/moves.

Current workaround on this error is deleting the cache and thereby forcing reassignment of resources from scratch. Using this workaround or draining a node before AvailableResources change are both heavier operations than what forcing new pinning would be.

askervin added a commit to askervin/nri-plugins that referenced this issue Jul 10, 2023
Test that a running container gets reassigned into new CPUs when the
CPUs where it used to run are not included in AvailableResources
anymore.

Tests issue containers#92.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant