Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ansor] Improve OpenCL support #10108

Merged
merged 4 commits into from
Feb 1, 2022
Merged

[Ansor] Improve OpenCL support #10108

merged 4 commits into from
Feb 1, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Jan 30, 2022

Currently only Mali is enabled for opencl target, I added this change to enable Intel GPUs.

Determining the correct warp size is tricky on OpenCL and on Intel in particular, so I added an env var TVM_OPENCL_WARP_SIZE so that a user can enforce a certain value. I found that on Intel GPU, using warp_size == 1 leads to tuning being stuck.

@comaniac @FrozenGene @jcf94

code = open("perf/%s_manual.cu" % TASK).read()
return code


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is stale and blocks this script from running. We don't need this anymore, so better just to remove it.

src/auto_scheduler/search_task.cc Outdated Show resolved Hide resolved
@@ -122,7 +123,8 @@ void OpenCLWorkspace::GetAttr(Device dev, DeviceAttrKind kind, TVMRetValue* rv)
corresponding to the number of SIMD entries the heardware configures.
We need to figure out a way to query this information from the hardware.
*/
*rv = 1;
const int warp_size = dmlc::GetEnv("TVM_OPENCL_WARP_SIZE", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't really like environment variable since it creates side effects, I don't have a better solution just as mentioned by the above TODO. Maybe that's it for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our vulkan backend has a better solution, that uses a Vulkan API function to query the warp size on a given HW. However, I didn't find such API in OpenCL, for some reason. clGetKernelSubGroupInfoKHR described in https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_subgroups.asciidoc looks closest, but it requires a compiled kernel as an argument, which I find strange since we want warp size information to write or generate a kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. It's reasonable that the desired API is not always available. A better solution I'm thinking is the direction of exposing this option to the hardware parameters in tuning options instead of an environment variable. For example, the default wrap size of OpenCL devices is always 1, or user should provide wrap size in hardware parameters otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reminds me of the fact that it is already possible to set HW params from a python script https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194

So in practice, this patch might not be necessary. But since the possibility to manually specify HW params is not known well and cumbersome anyway, I still want to land this PR. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I agree to have this change but with a different reason. If we just look at the device API without Ansor, this change could be the only general workaround for OpenCL devices. Specifically, any place in TVM may query device API to get the wrap size, so setting default wrap size to 1 in Ansor might not be a general solution.

masahi and others added 2 commits February 1, 2022 04:36
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
@masahi masahi merged commit f9ddcdb into apache:main Feb 1, 2022
<< "Warp size 1 is not recommended for OpenCL devices. Tuning might crash or stuck";
}

int max_vthread_extent = warp_size / 4;
Copy link
Member

@FrozenGene FrozenGene Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just come back after vocation. I want to check here max_vthread_extent. As I wrote in the tutorial https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194 : max_vthread_extent = int(dev.warp_size / 4) if int(dev.warp_size / 4) > 1 else dev.warp_size. If warp_size is 1, currently code of max_vthread_extent will be 0. Previous experiment shows we will be stacked or crashed. @masahi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it like Vulkan int max_vthread_extent = std::max(1, warp_size / 4); https://github.com/apache/tvm/blob/main/src/auto_scheduler/search_task.cc#L153 @masahi

@masahi masahi mentioned this pull request Feb 9, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
* Support OpenCL in Autoscheduler tuning

* add warning

* Update src/auto_scheduler/search_task.cc

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* fix lint

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants