Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HMM profiling event #96

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

PhilipYangA
Copy link

Changes:

rocm smi lib: Define HMM profiling events
rocm_smi.py: showevents support HMM profiling events
unit test: TestEvtNotifReadWrite support HMM profiling events

bill-shuzhou-liu and others added 7 commits January 26, 2022 09:36
Install LICENSE.txt to share/doc/smi-lib

Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d
Add rpm License header for cpack

Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
(cherry picked from commit ec71380)
SMI lib function rsmi_event_notification_get read events from all GPUs,
each event returned with device dv_idx. Currently we create read thread
for each GPU, it is not necessary because each thread reads same events,
and each thread display events from other GPUs with incorrect GPU index.

Create one read thread for multiple GPUs, and display event with correct
GPU index received from data.dv_idx.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Update kfd_ioctl.h from KFD to add HMM migration and recoverable page
fault, queue eviction and restore event, and event triggers defines.

Update rocm_smi.h to add new SMI notification events and triggers
defines, with the same enum value as kfd_ioctl.h, to avoid value
translation in smi lib.

Change fscanf %63s format to %MAX_EVENT_NOTIFICATION_MSG_SIZE[^\n] to
read entire line as one message.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Use SMI_EVENT_ALL_PROCESS to receive event from all processes
because HMM migration events are per process event, KFD requires
this flag plus super user premission to receive events from other
process, so showevents to relaunchAsSudo if arguments are in
new event list.

User can specify event name in short format, for example "--showevents
migrate" will show MIGRATE_START, MIGRATE_END events.

Define message size using macro from rocm_smi.h

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Add new event names defines, set event mask RSMI_EVT_NOTIF_ALL_PROCESS
to receive events from all processes.

Add protection check in case new event type returns from KFD, to avoid
out of range access segmentation fault.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Copy link
Contributor

@bill-shuzhou-liu bill-shuzhou-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have a few comments of this change.

src/rocm_smi.cc Show resolved Hide resolved
include/rocm_smi/rocm_smi.h Show resolved Hide resolved
python_smi_tools/rsmiBindings.py Show resolved Hide resolved
python_smi_tools/rocm_smi.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants