-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set affinity by device UUID. #5566
Conversation
47e1985
to
68417ad
Compare
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
47e1985
to
eb50e8d
Compare
CI MESSAGE: [16649134]: BUILD STARTED |
CI MESSAGE: [16649134]: BUILD FAILED |
CI MESSAGE: [16669228]: BUILD STARTED |
return dev; | ||
} | ||
|
||
void GetNVMLAffinityMask(cpu_set_t *mask, size_t num_cpus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is moved here from the header.
size_t cpu_set_size = (num_cpus + 63) / 64; | ||
std::vector<unsigned long> nvml_mask_container(cpu_set_size); // NOLINT(runtime/int) | ||
auto * nvml_mask = nvml_mask_container.data(); | ||
nvmlDevice_t device = nvmlGetDeviceHandleForCUDA(device_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the important line.
CPU_AND(mask, &nvml_set, ¤t_set); | ||
} | ||
|
||
void SetCPUAffinity(int core) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved here from the header without any changes.
fa6dd16
to
eb1e0f7
Compare
CI MESSAGE: [16669418]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [16671324]: BUILD STARTED |
CI MESSAGE: [16671324]: BUILD PASSED |
Category:
Bug fix (non-breaking change which fixes an issue)
Description:
NVML and CUDA runtime use different device indices. Device UUID is a reliable way of establishing device identity.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A