[Ansor] Improve OpenCL support #10108

masahi · 2022-01-30T20:33:09Z

Currently only Mali is enabled for opencl target, I added this change to enable Intel GPUs.

Determining the correct warp size is tricky on OpenCL and on Intel in particular, so I added an env var TVM_OPENCL_WARP_SIZE so that a user can enforce a certain value. I found that on Intel GPU, using warp_size == 1 leads to tuning being stuck.

@comaniac @FrozenGene @jcf94

masahi · 2022-01-30T20:38:24Z

apps/topi_recipe/gemm/cuda_gemm_square.py

-        code = open("perf/%s_manual.cu" % TASK).read()
-    return code
-
-


This code is stale and blocks this script from running. We don't need this anymore, so better just to remove it.

src/auto_scheduler/search_task.cc

comaniac · 2022-01-31T17:36:31Z

src/runtime/opencl/opencl_device_api.cc

@@ -122,7 +123,8 @@ void OpenCLWorkspace::GetAttr(Device dev, DeviceAttrKind kind, TVMRetValue* rv)
               corresponding to the number of SIMD entries the heardware configures.
               We need to figure out a way to query this information from the hardware.
      */
-      *rv = 1;
+      const int warp_size = dmlc::GetEnv("TVM_OPENCL_WARP_SIZE", 1);


Although I don't really like environment variable since it creates side effects, I don't have a better solution just as mentioned by the above TODO. Maybe that's it for now.

Our vulkan backend has a better solution, that uses a Vulkan API function to query the warp size on a given HW. However, I didn't find such API in OpenCL, for some reason. clGetKernelSubGroupInfoKHR described in https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_subgroups.asciidoc looks closest, but it requires a compiled kernel as an argument, which I find strange since we want warp size information to write or generate a kernel.

Understood. It's reasonable that the desired API is not always available. A better solution I'm thinking is the direction of exposing this option to the hardware parameters in tuning options instead of an environment variable. For example, the default wrap size of OpenCL devices is always 1, or user should provide wrap size in hardware parameters otherwise.

That reminds me of the fact that it is already possible to set HW params from a python script https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194

So in practice, this patch might not be necessary. But since the possibility to manually specify HW params is not known well and cumbersome anyway, I still want to land this PR. What do you think?

Hmm I agree to have this change but with a different reason. If we just look at the device API without Ansor, this change could be the only general workaround for OpenCL devices. Specifically, any place in TVM may query device API to get the wrap size, so setting default wrap size to 1 in Ansor might not be a general solution.

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

FrozenGene · 2022-02-09T05:50:12Z

src/auto_scheduler/search_task.cc

+            << "Warp size 1 is not recommended for OpenCL devices. Tuning might crash or stuck";
+      }
+
+      int max_vthread_extent = warp_size / 4;


Sorry I just come back after vocation. I want to check here max_vthread_extent. As I wrote in the tutorial https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autoscheduler/tune_network_mali.py#L188-L194 : max_vthread_extent = int(dev.warp_size / 4) if int(dev.warp_size / 4) > 1 else dev.warp_size. If warp_size is 1, currently code of max_vthread_extent will be 0. Previous experiment shows we will be stacked or crashed. @masahi

I think we should do it like Vulkan int max_vthread_extent = std::max(1, warp_size / 4); https://github.com/apache/tvm/blob/main/src/auto_scheduler/search_task.cc#L153 @masahi

* Support OpenCL in Autoscheduler tuning * add warning * Update src/auto_scheduler/search_task.cc Co-authored-by: Cody Yu <comaniac0422@gmail.com> * fix lint Co-authored-by: Cody Yu <comaniac0422@gmail.com>

masahi added 2 commits January 31, 2022 05:03

Support OpenCL in Autoscheduler tuning

99b8158

add warning

6662197

masahi requested review from areusch, comaniac, Hzfengsy, icemelon, jcf94, jroesch, junrushao, kazum, liangfu, merrymercy, tmoreau89, tqchen, vinx13, yzhliu, zhiics and ZihengJiang as code owners January 30, 2022 20:33

masahi commented Jan 30, 2022

View reviewed changes

comaniac approved these changes Jan 31, 2022

View reviewed changes

masahi and others added 2 commits February 1, 2022 04:36

Update src/auto_scheduler/search_task.cc

7a078d3

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

fix lint

493b1bb

masahi merged commit f9ddcdb into apache:main Feb 1, 2022

FrozenGene reviewed Feb 9, 2022

View reviewed changes

masahi mentioned this pull request Feb 9, 2022

[Ansor] OpenCL follow-up #10199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ansor] Improve OpenCL support #10108

[Ansor] Improve OpenCL support #10108

masahi commented Jan 30, 2022 •

edited

Loading

masahi Jan 30, 2022

comaniac Jan 31, 2022

masahi Jan 31, 2022

comaniac Jan 31, 2022

masahi Jan 31, 2022

comaniac Jan 31, 2022

FrozenGene Feb 9, 2022 •

edited

Loading

FrozenGene Feb 9, 2022

[Ansor] Improve OpenCL support #10108

[Ansor] Improve OpenCL support #10108

Conversation

masahi commented Jan 30, 2022 • edited Loading

masahi Jan 30, 2022

Choose a reason for hiding this comment

comaniac Jan 31, 2022

Choose a reason for hiding this comment

masahi Jan 31, 2022

Choose a reason for hiding this comment

comaniac Jan 31, 2022

Choose a reason for hiding this comment

masahi Jan 31, 2022

Choose a reason for hiding this comment

comaniac Jan 31, 2022

Choose a reason for hiding this comment

FrozenGene Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

FrozenGene Feb 9, 2022

Choose a reason for hiding this comment

masahi commented Jan 30, 2022 •

edited

Loading

FrozenGene Feb 9, 2022 •

edited

Loading