-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ansor] Improve OpenCL support #10108
Conversation
code = open("perf/%s_manual.cu" % TASK).read() | ||
return code | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is stale and blocks this script from running. We don't need this anymore, so better just to remove it.
@@ -122,7 +123,8 @@ void OpenCLWorkspace::GetAttr(Device dev, DeviceAttrKind kind, TVMRetValue* rv) | |||
corresponding to the number of SIMD entries the heardware configures. | |||
We need to figure out a way to query this information from the hardware. | |||
*/ | |||
*rv = 1; | |||
const int warp_size = dmlc::GetEnv("TVM_OPENCL_WARP_SIZE", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I don't really like environment variable since it creates side effects, I don't have a better solution just as mentioned by the above TODO. Maybe that's it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our vulkan backend has a better solution, that uses a Vulkan API function to query the warp size on a given HW. However, I didn't find such API in OpenCL, for some reason. clGetKernelSubGroupInfoKHR
described in https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_subgroups.asciidoc looks closest, but it requires a compiled kernel as an argument, which I find strange since we want warp size information to write or generate a kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. It's reasonable that the desired API is not always available. A better solution I'm thinking is the direction of exposing this option to the hardware parameters in tuning options instead of an environment variable. For example, the default wrap size of OpenCL devices is always 1, or user should provide wrap size in hardware parameters otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reminds me of the fact that it is already possible to set HW params from a python script https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194
So in practice, this patch might not be necessary. But since the possibility to manually specify HW params is not known well and cumbersome anyway, I still want to land this PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I agree to have this change but with a different reason. If we just look at the device API without Ansor, this change could be the only general workaround for OpenCL devices. Specifically, any place in TVM may query device API to get the wrap size, so setting default wrap size to 1 in Ansor might not be a general solution.
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
<< "Warp size 1 is not recommended for OpenCL devices. Tuning might crash or stuck"; | ||
} | ||
|
||
int max_vthread_extent = warp_size / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I just come back after vocation. I want to check here max_vthread_extent. As I wrote in the tutorial https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194 : max_vthread_extent = int(dev.warp_size / 4) if int(dev.warp_size / 4) > 1 else dev.warp_size
. If warp_size is 1, currently code of max_vthread_extent
will be 0. Previous experiment shows we will be stacked or crashed. @masahi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it like Vulkan int max_vthread_extent = std::max(1, warp_size / 4);
https://github.com/apache/tvm/blob/main/src/auto_scheduler/search_task.cc#L153 @masahi
* Support OpenCL in Autoscheduler tuning * add warning * Update src/auto_scheduler/search_task.cc Co-authored-by: Cody Yu <comaniac0422@gmail.com> * fix lint Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Currently only Mali is enabled for opencl target, I added this change to enable Intel GPUs.
Determining the correct warp size is tricky on OpenCL and on Intel in particular, so I added an env var
TVM_OPENCL_WARP_SIZE
so that a user can enforce a certain value. I found that on Intel GPU, usingwarp_size == 1
leads to tuning being stuck.@comaniac @FrozenGene @jcf94