-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Clang] Add support for device image compression #15124
base: sycl
Are you sure you want to change the base?
Conversation
Some initial performance stats: Dataset: https://github.com/aras-p/smol-v/tree/master/tests/spirv-dumps Conclusion: Note:- Most of the SPIR-V files I have in the dataset are <50KB. I'm working on extending the performance evaluation to larger workloads. Also, the (de)compression performance will vary with the format of the file being compressed, so for AOT, where device images consists of target assembly, the performance stats might differ. |
What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats? |
Also guessing this feature may not make sense when combined with the native cpu device, but need to think more about that. |
I think they are covered by the "none" binary image format. This is because clang driver (in SYCL offload mode) never specifies the image format in call to I tested my changes with PTX, and they seem to work fine, so, we'd likely not require additional formats. |
// REQUIRES: zstd, opencl-aot, cpu, linux | ||
|
||
////////////////////// Compile device images | ||
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 -fsycl-host-compiler=clang++ -fsycl-host-compiler-options='-std=c++17 -Wno-attributes -Wno-deprecated-declarations -fPIC -DENABLE_KERNEL1' -DENABLE_KERNEL1 -c %s -o %t_kernel1_aot.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you specifically wanted to test compilation with a 3rd-party host compiler you don't need -fsycl-host-compiler
and -fsycl-host-compiler-options
flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have a test that mimics the compilation toolchain that PyTorch team use (as described here: https://github.com/intel/torch-xpu-ops/blob/main/cmake/BuildFlags.cmake#L63). They use gcc
for final linkage. The problem with using gcc
here in E2E test is that we'd have to explicitly provide path to sycl headers and library (See the older version of this test: https://github.com/intel/llvm/blob/1d8181335fb188aa4ae0ad39b3826a4162b200d2/sycl/test-e2e/Compression/compression_seperate_compile.cpp). AFAIK, we don't have LIT substitutions to get path to SYCL headers and library, and so, I ended up using clang++ as "3rd party" compiler, with which I can just use -fsycl
to get the include directory and SYCL library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have a test that mimics the compilation toolchain that PyTorch team use
Then I think it worth noting that in a comment within the test, or otherwise it seems like an unnecessary overcomplication
@@ -308,6 +308,33 @@ def open_check_file(file_name): | |||
if sp[0] == 0: | |||
config.available_features.add("preview-breaking-changes-supported") | |||
|
|||
# Check if clang is built with ZSTD and compression support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a way simpler way. Use lit.site.cfg.py.in
to propagate a value of CMake variable into this python script.
I would also explore how LLVM propagates that. There are tests in LLVM which require zstd
feature, so I wonder if we can call some LIT helper to get this feature automatically propagated into LIT for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, passing LLVM_ENABLE_ZSTD
from CMake to LIT won't work here because E2E tests can be built standalone, like what we do in CI.
LLVM seems to pass CMake Variables to LIT: https://github.com/llvm/llvm-project/blob/main/llvm/test/lit.site.cfg.py.in#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. I still wonder if there is a simpler way (i.e. some existing helper for running an executable and getting its output, but if no, then we will have to leave with what we have.
BTW, I'm sure that compiler is able to read the program from stdin
, so you can maybe save on file operations here
{ | ||
sycl::buffer<int, 1> buffer1(&val, sycl::range(1)); | ||
|
||
q0.submit([&](sycl::handler &cgh) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description on this PR says that there is a threshold of 512 bytes, below which we don't compress.
Is this tiny kernel above or below that threshold? And shouldn't we have a test for the other side of that threshold as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tiny kernel above or below that threshold?
It is above the threshold maybe because I use -O0
to compile :P
And shouldn't we have a test for the other side of that threshold as well?
Yes, I have the following test to ensure that there's no compression when the size < threshold:
clang/test/Driver/clang-offload-wrapper-zstd.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for driver
This PR adds support for device image compression for the old offloading model. I'll make another follow-up PR to extend support for the new offload model.
Design summary:
This PR introduces ZSTD (https://github.com/facebook/zstd) as a 3rd party dependency of DPCPP. Similar to upstream LLVM, we expect user to have
zstd-dev
package installed on their machine - we won't be installing zstd from sources.How to use
To compress device images, add
--offload-compress
CLI option to your clang invocation. Note that we compress device images only if the size of device images exceeds a threshold, which is 512 bytes by default. Moreover, by default, we use ZSTD level 10 for compression. ZSTD compression levels provides a tradeoff between (de)compression time and compression ratio, and the compression level can be changed using--offload-compression-level=<int>
CLI option.