[UR][L0] Unify use of large allocation in L0 adapter #1099

jandres742 · 2023-11-21T05:41:30Z

Intel(R) GPUs have two modes of operation in terms of allocations:
Stateful and stateless mode.

Stateful optimizes memory accesses through pointer arithmetic.
This can be done as long as allocations used by the allocation
are smaller than 4GB.

Stateless disables such pointer-arithmetic optimization to
allow the kernel to use allocations larger than 4GB.

Currently, L0 adapter dynamically and automatically requests
the L0 driver large allocations if it detects an allocation size
is larger than 4GB. This creates a problem if a kernel has been
previously compiled for stateful access. This ultimately means
the adapter mixes stateful and stateless behavior, which is not
a user-friendly experience.

This patch aims at correcting this behavior by defining a default
one. On Intel(R) GPUs previous to Intel(R) Data Center GPU Max,
default behavior is now stateless, meaning all allocations are
only allowed by default. Users can opt-in for stateful mode setting
a new environment variable UR_L0_USE_OPTIMIZED_32BIT_ACCESS=1.

Addresses:
https://stackoverflow.com/questions/75621264/sycl-dot-product-code-gives-wrong-results

intel/llvm testing: intel/llvm#11958

jandres742 · 2023-11-22T17:31:11Z

@smaslov-intel : please review

MichalMrozek · 2023-11-23T10:32:58Z

source/adapters/level_zero/device.cpp

+  static const bool UseLargeAllocations = [this] {
+    const char *UrRet = std::getenv("UR_L0_ALLOW_LARGE_ALLOCATIONS");
+    if (!UrRet)
+      return (this->isPVC() ? true : false);


why this isPVC check ?
It is not required on PVC.
PVC Level Zero driver does it by default.

@MichalMrozek : this is so the rest of the adapter has "large allocation behavior".

But PVC is already in large allocation behavior by default.
There is no need to do any special for this device.
There is no need to add compiler option and it is quite dangerous to bypass max memory allocation size limits by default.
By using those variables it indicates that PVC requires some special handling for large allocations which in fact is not needed.

thanks @MichalMrozek . I have changed code to use defaults on PVC.

MichalMrozek · 2023-11-23T10:34:44Z

source/adapters/level_zero/device.hpp

+  // On some Intel GPUs, this influences how kernels are compiled.
+  // If large allocations (>4GB) are requested, then kernels are
+  // compiled with stateless access.
+  // If small allocations (<4GB) are requested, then kernels are


This is not accurate, even if -ze-opt-greater-than-4GB-buffer-required is not specified, kernels may still be compiled in stateless mode.

thanks, I will remove the comment

MichalMrozek · 2023-11-23T10:37:32Z

source/adapters/level_zero/device.hpp

+  // If small allocations (<4GB) are requested, then kernels are
+  // compiled with stateful access, with potential performance
+  // improvements.
+  // Some GPUs support only one mode, such us Intel(R) Data Center GPU Max,


Intel(R) Data Center GPU Max supports both stateful and stateless modes.
Level Zero implementation for Intel(R) Data Center GPU Max allows only stateless mode for this device.

I will fix comment indicating is the driver that has only that uspport.

jandres742 · 2023-11-26T02:31:54Z

@nrspruit: please review.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>

Intel(R) GPUs have two modes of operation in terms of allocations: Stateful and stateless mode. Stateful optimizes memory accesses through pointer arithmetic. This can be done as long as allocations used by the allocation are smaller than 4GB. Stateless disables such pointer-arithmetic optimization to allow the kernel to use allocations larger than 4GB. Currently, L0 adapter dynamically and automatically requests the L0 driver large allocations if it detects an allocation size is larger than 4GB. This creates a problem if a kernel has been previously compiled for stateful access. This ultimately means the adapter mixes stateful and stateless behavior, which is not a user-friendly experience. This patch aims at correcting this behavior by defining a default one. On Intel(R) GPUs previous to Intel(R) Data Center GPU Max, default behavior is now stateless, meaning all allocations are only allowed by default. Users can opt-in for stateful mode setting a new environment variable UR_L0_USE_OPTIMIZED_32BIT_ACCESS=1. Addresses: https://stackoverflow.com/questions/75621264/sycl-dot-product-code-gives-wrong-results Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

nrspruit · 2023-11-29T17:55:15Z

source/adapters/level_zero/program.cpp

+    // ze-opt-greater-than-4GB-buffer-required to disable
+    // stateful optimizations and be able to use larger than
+    // 4GB allocations on these kernels.
+    if (Context->Devices[0]->useOptimized32bitAccess() == 0) {


Should this be in the Exp function and have the Exp function have the Compile code now? That way you check the specific device being passed in and not just the Context->Devices[0] since you might be on a non-uniform system.

thanks @nrspruit . Good idea. However, urProgramCompileExp at this moment is unimplemented, so adding implementation for urProgramCompileExp on top of these changes would make this PR too big. I think it is better we merge this patch, then we add the support for urProgramCompileExp, including using the functionality from this patch. what do you think?

Sure, that would be fine, a follow-up patch would be good improvement on this.

nrspruit

+1 from me

jandres742 · 2023-12-01T21:00:07Z

@kbenzie : please merge when possible. This needs to be merged on top of #916

fabiomestre · 2023-12-05T16:48:21Z

I have updated the target branch of this PR from the adapters branch to the main branch.
Development in UR is moving back to main. The adapters branch will soon be deleted.

kbenzie · 2023-12-06T17:14:00Z

I'm going to create a combined intel/llvm PR which will include this and a few other PR's so we can get things merged quicker.

[UR][L0] Unify use of large allocation in L0 adapter

Combines the changes of the follow Unified Runtime pull requests: * oneapi-src/unified-runtime#1108 * oneapi-src/unified-runtime#988 * oneapi-src/unified-runtime#1071 * oneapi-src/unified-runtime#916 * oneapi-src/unified-runtime#1099

[UR][L0] Unify use of large allocation in L0 adapter

Combines the changes of the follow Unified Runtime pull requests: * oneapi-src/unified-runtime#1108 * oneapi-src/unified-runtime#988 * oneapi-src/unified-runtime#1071 * oneapi-src/unified-runtime#916 * oneapi-src/unified-runtime#1099

jandres742 force-pushed the largeallocations branch 3 times, most recently from 31a5796 to 3482cdf Compare November 22, 2023 16:34

jandres742 marked this pull request as ready for review November 22, 2023 17:31

jandres742 requested a review from a team as a code owner November 22, 2023 17:31

MichalMrozek reviewed Nov 23, 2023

View reviewed changes

jandres742 force-pushed the largeallocations branch from 843dd22 to c36c2bf Compare November 27, 2023 23:19

jandres742 mentioned this pull request Nov 28, 2023

[UR][L0] Make urPlatformGetBackendOption return -ze-opt-level=2 for -O1 and -O2 #1129

Merged

[UR][L0] Check Global Mem Size as Limit for Free Memory

40c8da9

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>

kbenzie added the v0.8.x Include in the v0.8.x release label Nov 28, 2023

jandres742 force-pushed the largeallocations branch from c36c2bf to 28590a8 Compare November 28, 2023 18:28

nrspruit reviewed Nov 29, 2023

View reviewed changes

nrspruit approved these changes Nov 29, 2023

View reviewed changes

kbenzie added the ready to merge Added to PR's which are ready to merge label Dec 4, 2023

fabiomestre changed the base branch from adapters to main December 5, 2023 16:48

kbenzie mentioned this pull request Dec 6, 2023

Candidate for v0.8.2 release tag #1163

Merged

13 tasks

kbenzie merged commit ce4acbc into oneapi-src:main Dec 6, 2023
52 checks passed

kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Dec 6, 2023

Merge pull request oneapi-src#1099 from jandres742/largeallocations

52ea473

[UR][L0] Unify use of large allocation in L0 adapter

kbenzie mentioned this pull request Dec 6, 2023

[UR] Bump tag to ce4acbc4 intel/llvm#12101

Merged

kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Dec 15, 2023

Merge pull request oneapi-src#1099 from jandres742/largeallocations

0b95702

[UR][L0] Unify use of large allocation in L0 adapter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UR][L0] Unify use of large allocation in L0 adapter #1099

[UR][L0] Unify use of large allocation in L0 adapter #1099

jandres742 commented Nov 21, 2023 •

edited

Loading

jandres742 commented Nov 22, 2023

MichalMrozek Nov 23, 2023

jandres742 Nov 26, 2023

MichalMrozek Nov 27, 2023

jandres742 Nov 27, 2023

MichalMrozek Nov 23, 2023

jandres742 Nov 26, 2023

MichalMrozek Nov 23, 2023

jandres742 Nov 26, 2023

jandres742 commented Nov 26, 2023

nrspruit Nov 29, 2023

jandres742 Nov 29, 2023

nrspruit Nov 29, 2023

nrspruit left a comment

jandres742 commented Dec 1, 2023 •

edited

Loading

fabiomestre commented Dec 5, 2023

kbenzie commented Dec 6, 2023

[UR][L0] Unify use of large allocation in L0 adapter #1099

[UR][L0] Unify use of large allocation in L0 adapter #1099

Conversation

jandres742 commented Nov 21, 2023 • edited Loading

jandres742 commented Nov 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jandres742 commented Nov 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrspruit left a comment

Choose a reason for hiding this comment

jandres742 commented Dec 1, 2023 • edited Loading

fabiomestre commented Dec 5, 2023

kbenzie commented Dec 6, 2023

jandres742 commented Nov 21, 2023 •

edited

Loading

jandres742 commented Dec 1, 2023 •

edited

Loading