[SYCL][Clang] Add support for device image compression #15124

uditagarwal97 · 2024-08-18T17:06:44Z

This PR adds support for device image compression for the old offloading model. I'll make another follow-up PR to extend support for the new offload model.

Design summary:

ZSTD (compression algo)   ----> LLVMSupport (Interface)  ------> clang-offload-wrapper (For compression)
 |
 ----------------------------------------------------- --------> SYCL RT (For decompression)

This PR introduces ZSTD (https://github.com/facebook/zstd) as a 3rd party dependency of DPCPP. Similar to upstream LLVM, we expect user to have zstd-dev package installed on their machine - we won't be installing zstd from sources.

How to use
To compress device images, add --offload-compress CLI option to your clang invocation. Note that we compress device images only if the size of device images exceeds a threshold, which is 512 bytes by default. Moreover, by default, we use ZSTD level 10 for compression. ZSTD compression levels provides a tradeoff between (de)compression time and compression ratio, and the compression level can be changed using --offload-compression-level=<int> CLI option.

uditagarwal97 · 2024-08-21T18:07:15Z

Some initial performance stats:

Dataset: https://github.com/aras-p/smol-v/tree/master/tests/spirv-dumps
Dataset size: 275 SPIR-V files

Conclusion:
Overall, for SPIR-V files < 50KB, the decompression time is below 0.1ms, compression time <0.15ms, and compression ratio is ~3 (compressed image is 1/3 the original size).
For very small images (<512 bytes), I don't see much benefit of image compression.

Note:- Most of the SPIR-V files I have in the dataset are <50KB. I'm working on extending the performance evaluation to larger workloads. Also, the (de)compression performance will vary with the format of the file being compressed, so for AOT, where device images consists of target assembly, the performance stats might differ.

jbrodman · 2024-08-21T19:09:33Z

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

jbrodman · 2024-08-21T19:15:48Z

Also guessing this feature may not make sense when combined with the native cpu device, but need to think more about that.

uditagarwal97 · 2024-08-21T23:23:28Z

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

I think they are covered by the "none" binary image format. This is because clang driver (in SYCL offload mode) never specifies the image format in call to clang-offload-wrapper. So, by default, the BinaryImageFormat is "none" and it is upto the SYCL runtime to determine the format (https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.cpp#L170).

I tested my changes with PTX, and they seem to work fine, so, we'd likely not require additional formats.

uditagarwal97 · 2024-09-16T14:39:01Z

Ping @premanandrao @mdtoguchi @bso-intel

sycl/source/detail/compression.hpp

sycl/source/detail/device_binary_image.cpp

sycl/test-e2e/Compression/compression_multiple_tu.cpp

sycl/test-e2e/Compression/compression_seperate_compile.cpp

AlexeySachkov · 2024-09-17T14:16:45Z

sycl/test-e2e/Compression/compression_seperate_compile.cpp

+// REQUIRES: zstd, opencl-aot, cpu, linux
+
+//////////////////////  Compile device images
+// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 -fsycl-host-compiler=clang++ -fsycl-host-compiler-options='-std=c++17 -Wno-attributes -Wno-deprecated-declarations -fPIC -DENABLE_KERNEL1' -DENABLE_KERNEL1 -c %s -o %t_kernel1_aot.o


Unless you specifically wanted to test compilation with a 3rd-party host compiler you don't need -fsycl-host-compiler and -fsycl-host-compiler-options flags

I wanted to have a test that mimics the compilation toolchain that PyTorch team use (as described here: https://github.com/intel/torch-xpu-ops/blob/main/cmake/BuildFlags.cmake#L63). They use gcc for final linkage. The problem with using gcc here in E2E test is that we'd have to explicitly provide path to sycl headers and library (See the older version of this test: https://github.com/intel/llvm/blob/1d8181335fb188aa4ae0ad39b3826a4162b200d2/sycl/test-e2e/Compression/compression_seperate_compile.cpp). AFAIK, we don't have LIT substitutions to get path to SYCL headers and library, and so, I ended up using clang++ as "3rd party" compiler, with which I can just use -fsycl to get the include directory and SYCL library.

I wanted to have a test that mimics the compilation toolchain that PyTorch team use

Then I think it worth noting that in a comment within the test, or otherwise it seems like an unnecessary overcomplication

sycl/test-e2e/Compression/compression_seperate_compile.cpp

sycl/test-e2e/Compression/no_zstd_warning.cpp

AlexeySachkov · 2024-09-17T15:03:19Z

sycl/test-e2e/lit.cfg.py

@@ -308,6 +308,33 @@ def open_check_file(file_name):
 if sp[0] == 0:
    config.available_features.add("preview-breaking-changes-supported")

+# Check if clang is built with ZSTD and compression support.


There is a way simpler way. Use lit.site.cfg.py.in to propagate a value of CMake variable into this python script.

I would also explore how LLVM propagates that. There are tests in LLVM which require zstd feature, so I wonder if we can call some LIT helper to get this feature automatically propagated into LIT for us

As discussed offline, passing LLVM_ENABLE_ZSTD from CMake to LIT won't work here because E2E tests can be built standalone, like what we do in CI.
LLVM seems to pass CMake Variables to LIT: https://github.com/llvm/llvm-project/blob/main/llvm/test/lit.site.cfg.py.in#L37

Yeah, good point. I still wonder if there is a simpler way (i.e. some existing helper for running an executable and getting its output, but if no, then we will have to leave with what we have.

BTW, I'm sure that compiler is able to read the program from stdin, so you can maybe save on file operations here

cperkinsintel · 2024-09-17T17:11:40Z

sycl/test-e2e/Compression/Inputs/single_kernel.cpp

+  {
+    sycl::buffer<int, 1> buffer1(&val, sycl::range(1));
+
+    q0.submit([&](sycl::handler &cgh) {


The description on this PR says that there is a threshold of 512 bytes, below which we don't compress.

Is this tiny kernel above or below that threshold? And shouldn't we have a test for the other side of that threshold as well?

Is this tiny kernel above or below that threshold?

It is above the threshold maybe because I use -O0 to compile :P

And shouldn't we have a test for the other side of that threshold as well?

Yes, I have the following test to ensure that there's no compression when the size < threshold:
clang/test/Driver/clang-offload-wrapper-zstd.c

clang/lib/Driver/ToolChains/Clang.cpp

clang/test/Driver/sycl-offload-wrapper-compression.cpp

…er test.

mdtoguchi

OK for driver

uditagarwal97 added 4 commits August 9, 2024 18:22

Add sycl-compress

ef323f7

Fix decompression in RT

bdab2f0

Cleanup

45f1e99

Fix ZSTD Cmake dependencies

34978f8

uditagarwal97 self-assigned this Aug 18, 2024

Merge branch 'sycl' into compress_img

195e961

uditagarwal97 had a problem deploying to WindowsCILock August 18, 2024 17:08 — with GitHub Actions Failure

Remove unwanted formatting changes

cd64225

uditagarwal97 had a problem deploying to WindowsCILock August 18, 2024 18:02 — with GitHub Actions Error

More cleanup

d89f41b

uditagarwal97 had a problem deploying to WindowsCILock August 18, 2024 18:14 — with GitHub Actions Failure

Add option in clang driver to trigger compression.

fb643e3

uditagarwal97 had a problem deploying to WindowsCILock August 18, 2024 22:16 — with GitHub Actions Failure

Cleanup + build fix

151e70a

uditagarwal97 had a problem deploying to WindowsCILock August 19, 2024 06:42 — with GitHub Actions Failure

Fix ZSTD build on windows, RHEL

2983fab

uditagarwal97 had a problem deploying to WindowsCILock August 19, 2024 15:08 — with GitHub Actions Failure

uditagarwal97 added 2 commits August 19, 2024 09:18

Merge remote-tracking branch 'upstream/sycl' into compress_img

054984c

Fix clang warnings and formatting

4493984

uditagarwal97 had a problem deploying to WindowsCILock August 19, 2024 16:21 — with GitHub Actions Failure

Try fixing Windows build

dbb96a7

uditagarwal97 had a problem deploying to WindowsCILock August 20, 2024 05:54 — with GitHub Actions Failure

uditagarwal97 added 2 commits August 25, 2024 21:57

Merge remote-tracking branch 'upstream/sycl' into compress_img

6c26a42

Fix linkage error while windows build

7d7edc6

uditagarwal97 had a problem deploying to WindowsCILock August 26, 2024 06:14 — with GitHub Actions Failure

Fix include_directory for sycl-compress

f0aca25

uditagarwal97 had a problem deploying to WindowsCILock September 15, 2024 16:13 — with GitHub Actions Failure

uditagarwal97 temporarily deployed to WindowsCILock September 15, 2024 16:54 — with GitHub Actions Inactive

Fix unreferenced var error on MSVC; Remove debug prints.

32e4868

uditagarwal97 temporarily deployed to WindowsCILock September 15, 2024 20:48 — with GitHub Actions Inactive

uditagarwal97 temporarily deployed to WindowsCILock September 15, 2024 21:21 — with GitHub Actions Inactive

uditagarwal97 added 2 commits September 16, 2024 23:59

Simply E2E test and fix failure on CUDA

fd0f1e3

Merge remote-tracking branch 'origin/sycl' into HEAD

07c119e

uditagarwal97 temporarily deployed to WindowsCILock September 17, 2024 07:02 — with GitHub Actions Inactive

uditagarwal97 temporarily deployed to WindowsCILock September 17, 2024 07:36 — with GitHub Actions Inactive

AlexeySachkov reviewed Sep 17, 2024

View reviewed changes

cperkinsintel reviewed Sep 17, 2024

View reviewed changes

Address reviews

eb7588c

uditagarwal97 temporarily deployed to WindowsCILock September 17, 2024 19:56 — with GitHub Actions Inactive

uditagarwal97 requested a review from cperkinsintel September 17, 2024 20:00

uditagarwal97 temporarily deployed to WindowsCILock September 17, 2024 21:06 — with GitHub Actions Inactive

mdtoguchi reviewed Sep 17, 2024

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Show resolved Hide resolved

Add clang driver test. Address reviews.

58f9939

uditagarwal97 temporarily deployed to WindowsCILock September 18, 2024 00:52 — with GitHub Actions Inactive

uditagarwal97 requested a review from mdtoguchi September 18, 2024 00:54

uditagarwal97 had a problem deploying to WindowsCILock September 18, 2024 01:31 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock September 18, 2024 05:08 — with GitHub Actions Failure

Fix detection of zstd LIT feature on Windows

575efc6

uditagarwal97 temporarily deployed to WindowsCILock September 18, 2024 16:40 — with GitHub Actions Inactive

mdtoguchi reviewed Sep 18, 2024

View reviewed changes

clang/test/Driver/sycl-offload-wrapper-compression.cpp Outdated Show resolved Hide resolved

uditagarwal97 had a problem deploying to WindowsCILock September 18, 2024 17:33 — with GitHub Actions Failure

Simplify zstd detection in LIT; Remove ZSTD requirement in clang driv…

970ad35

…er test.

uditagarwal97 temporarily deployed to WindowsCILock September 18, 2024 23:45 — with GitHub Actions Inactive

mdtoguchi approved these changes Sep 18, 2024

View reviewed changes

uditagarwal97 had a problem deploying to WindowsCILock September 19, 2024 00:22 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Clang] Add support for device image compression #15124

[SYCL][Clang] Add support for device image compression #15124

uditagarwal97 commented Aug 18, 2024 •

edited

Loading

uditagarwal97 commented Aug 21, 2024 •

edited

Loading

jbrodman commented Aug 21, 2024

jbrodman commented Aug 21, 2024

uditagarwal97 commented Aug 21, 2024 •

edited

Loading

uditagarwal97 commented Sep 16, 2024

AlexeySachkov Sep 17, 2024

uditagarwal97 Sep 17, 2024

AlexeySachkov Sep 19, 2024

AlexeySachkov Sep 17, 2024

uditagarwal97 Sep 17, 2024

AlexeySachkov Sep 18, 2024

cperkinsintel Sep 17, 2024

uditagarwal97 Sep 17, 2024

mdtoguchi left a comment

[SYCL][Clang] Add support for device image compression #15124

Are you sure you want to change the base?

[SYCL][Clang] Add support for device image compression #15124

Conversation

uditagarwal97 commented Aug 18, 2024 • edited Loading

uditagarwal97 commented Aug 21, 2024 • edited Loading

jbrodman commented Aug 21, 2024

jbrodman commented Aug 21, 2024

uditagarwal97 commented Aug 21, 2024 • edited Loading

uditagarwal97 commented Sep 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdtoguchi left a comment

Choose a reason for hiding this comment

uditagarwal97 commented Aug 18, 2024 •

edited

Loading

uditagarwal97 commented Aug 21, 2024 •

edited

Loading

uditagarwal97 commented Aug 21, 2024 •

edited

Loading