-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tribits: MKL+SYCL TPL support #13330
base: develop
Are you sure you want to change the base?
Tribits: MKL+SYCL TPL support #13330
Conversation
- If TPL_ENABLE_MKL and Kokkos_ENABLE_SYCL are both on, then search for the MKL and MKL_SYCL CMake targets with the 32-bit (intel_lp64) interface. If SYCL is off, then use libmkl_rt for MKL (this automatically links in host libraries with 32-bit interface). KokkosKernels: remove SYCL MKL override Tribits: Use MKL libs for BLAS/LAPACK if enabled Tribits: Add MKL_PROVIDES_BLAS_LAPACK cmake variable to tell Trilinos packages whether BLAS/LAPACK should be called with int or MKL_INT.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
BTW, if anyone has an idea on how to provide a helpful error message for when OneAPI version is < 2024.1, I would like to do that. The problem is that now, I don't think there's a way to know the version until |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-kelley, thanks for taking the time to set up these FindTPL<tplName>.cmake
files. This is an interesting use case where all of the requirements for a downstream TPL are supplied by an upstream TPL in some cases (specifically when MKL
is enabled, that MKL::all_libs
target provides everything that is needed by BLAS and LAPACK). But the current TriBITS TPL support software may not be set up to smoothly support this use case.
First, this is a great change in that makes it easier to use the MKL BLAS and LAPACK implementations on machines where the MKL is provided. (That used to be a pain in the past with a bunch of different libraries you had to link against and in a specific order.)
However, there may be a significant problem with the installed TriBITS-generated BLASConfig.cmake
and LAPACKConfig.cmake
files that are documented here when TPL_ENABLE_MKL=ON
and Kokkos_ENABLE_SYCL=OFF
. I think the files BLASConfig.cmake
and LAPACKConfig.cmake
files will be incorrectly calling find_package(MKL)
while the installed TriBITS-generated MKLConfig.cmake
file will not. Have you tested this with any downstream CMake projects using KokkosKernels through Trilinos with find_package(KokkosKernels)
to see how this works?
Other than that, this PR should address:
-
Update the comment at https://github.com/brian-kelley/Trilinos/blob/278c810222470aa6a9ad647bbfdd15480719e608/cmake/TPLs/FindTPLMKL.cmake#L12-L17 to mention that enabling MKL provides BLAS and/or LAPACK implementations when BLAS and/or LAPACK TPLs are enabled. Right now, it is an old comment from Mark H. that says that the MKL does not provide the BLAS and LAPACK implementations (which is no longer true with with this commit).
-
Define the optional dependency of the TriBITS BLAS and LAPACK External Packages/TPLs on the MKL External Package/TPL in the files
FindTPLBLASDependencies.cmake
andFindTPLLAPACKDependencies.cmake
, respectively (seeFindTPL<tplName>Dependencies.cmake
). That tells TriBITS that there is an optional dependency on these TPLs for some types of scenarios.
As an alternative and cleaner implementation that would likely solve the issue described above, you might just consider defining optional dependencies of the BLAS and LAPACK TPLs on the MKL TPL by adding the files FindTPLBLASDependencies.cmake
and FindTPLLAPACKDependencies.cmake
, respectively (the latter already exists and just needs to add MKL
). Then I think you can might be able to just call tribits_tpl_find_include_dirs_and_libraries(BLAS)
and tribits_tpl_find_include_dirs_and_libraries(LAPACK)
to create the required wrapper targets and <tplName>Config.cmake
files for BLAS and LAPACK, respectively. That might just work (but I would have to check the unit tests for that use case). And if that does not already work, it would likely be easy to get that use case to work and I could add a HowTo to https://tribitspub.github.io/TriBITS/users_guide/index.html#howtos called "Creating FindTPL<tplName>.cmake
where all dependencies are satisfied by an upstream enabled dependent External Package/TPL". I think that would make the installed TriBITS-generated BLASConfig.cmake
and LAPACKConfig.cmake
files work correctly.
@@ -1,25 +1,54 @@ | |||
if (TPL_ENABLE_MKL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically breaks backward compatibility for current configurations that enabled both MKL and BLAS. But if things go well, the only practical differences that the user should notice is different numbers being computed from BLAS (subject to reasonable roundoff error bounds). Assuming there are no problems with the MKL BLAS, this should be fine.
# For OneAPI MKL on GPU, use the CMake target | ||
# Temporarily change CMAKE_CXX_COMPILER to icpx to convince MKL to add the DPCPP target | ||
# If it sees that CMAKE_CXX_COMPILER is just ".../mpicxx", it won't do this. | ||
set(CMAKE_CXX_COMPILER_PREVIOUS "${CMAKE_CXX_COMPILER}") | ||
set(CMAKE_CXX_COMPILER "icpx") | ||
# Use the BLAS95 and LAPACK95 interfaces | ||
set(MKL_INTERFACE lp64) | ||
find_package(MKL REQUIRED COMPONENTS MKL::MKL MKL::MKL_SYCL) #MKL::MKL_SYCL::BLAS MKL::MKL_SYCL::LAPACK) | ||
IF (NOT MKL_FOUND) | ||
MESSAGE(FATAL_ERROR "MKL (as CMake package) was not found! This is required for SYCL+MKL") | ||
ENDIF() | ||
set(CMAKE_CXX_COMPILER "${CMAKE_CXX_COMPILER_PREVIOUS}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is unfortunate that you have to trick find_package(MKL ...)
by setting CMAKE_CXX_COMPILER
to icpx
and then set it back to the MPI compiler wrapper.
NOTE: I guess this might be one advantage of TriBITS/Trilinos using the FindMPI.cmake
where you can use CMAKE_CXX_COMPILER
set to the the raw C++ compiler and the module provides the modern CMake target MPI::MPI_CXXto pull in the compile and link requirements? (But I have not used
find_package(MPI)in a long time and I can't seem to find definitive documentation on how the compiler is actually handled with
find_package(MPI)to be sure. But that is how
FindMPI.cmake` used to behave.)
# Use the Tribits-generated target (tribits::MKL::mkl_rt) as the BLAS library | ||
tribits_extpkg_create_imported_all_libs_target_and_config_file( BLAS | ||
INNER_FIND_PACKAGE_NAME MKL | ||
IMPORTED_TARGETS_FOR_ALL_LIBS tribits::MKL::mkl_rt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. When Kokkos_ENABLE_SYCL
is not enabled, then the FindTPLMKL.cmake
module calls:
TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( MKL
REQUIRED_HEADERS mkl.h
REQUIRED_LIBS_NAMES mkl_rt
)
so it just finds the one library.
And then this call:
tribits_extpkg_create_imported_all_libs_target_and_config_file( BLAS
INNER_FIND_PACKAGE_NAME MKL
IMPORTED_TARGETS_FOR_ALL_LIBS tribits::MKL::mkl_rt)
will cause TriBITS to generate a BLASConfig.cmake
file that calls find_package(MKL REQUIRED)
but any targets defined by that will be ignored and and will just use the already created target tribits::MKL::mkl_rt
but the TriBITS-generated MKLConfig.cmake
file. But since find_package(MKL)
was never called really, then is this the right thing to do? In this case, with Kokkos_ENABLE_SYCL
disabled, the MKLConfig.cmake
file generated by TriBITS is not actually calling find_package(MKL)
. So I am not sure this is going to work robustly the way one would like. I would have to inspect this configuration for this case and see what actually happens.
If that does work, then it might be better to just link with MKL::all_libs
and not point to an implementation-defined target like tribits::MKL::mkl_rt
? Then, in the future when find_package(MKL)
can be used, we will not have to modify this FindTPLBLAS.cmake
wrapper?
@@ -1,3 +1,4 @@ | |||
include(CMakePrintHelpers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming this was only for debugging? You might take this out of the production version. (It can be added back for manual debugging.)
TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( MKL | ||
REQUIRED_HEADERS mkl.h | ||
REQUIRED_LIBS_NAMES mkl_rt | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This provides the target MKL::all_libs
after it this function returns. Why not just use the target MKL::all_libs
in downstream FindTPL<tplName>.cmake
modules?
if (Kokkos_ENABLE_SYCL) | ||
# FindTPLMKL already called find_package(MKL). | ||
# The MKL::MKL target was required for MKL when SYCL is enabled. | ||
tribits_extpkg_create_imported_all_libs_target_and_config_file( LAPACK | ||
INNER_FIND_PACKAGE_NAME MKL | ||
IMPORTED_TARGETS_FOR_ALL_LIBS MKL::MKL) | ||
else () | ||
# Use the Tribits-generated target (tribits::MKL::mkl_rt) as the LAPACK library | ||
tribits_extpkg_create_imported_all_libs_target_and_config_file( LAPACK | ||
INNER_FIND_PACKAGE_NAME MKL | ||
IMPORTED_TARGETS_FOR_ALL_LIBS tribits::MKL::mkl_rt) | ||
endif () | ||
MESSAGE(STATUS "MKL will provide LAPACK functions.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this entire block of code could be removed to just depend on the MKL::all_libs
target. I am not sure the TriBITS implementation is set up to support this use case so this might be the best you can do for now?
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@bartlettroscoe Thanks a lot for the review. I'm realizing that I didn't think through the "MKL-as-BLAS" logic enough, especially if the TPL cmake files installed with Trilinos are broken. But this was kind of an extra change on top of my original goal, which was to allow MKL to be used on SYCL/GPU. Is it OK with you to do the MKL/BLAS/LAPACK logic later, and just reduce this PR down to the SYCL changes? In the next FY I will have more time to do the other part correctly using FindTPLDependencies.cmake. |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@brian-kelley, yes, if you just implement that case where But now that I think about it, if you link to two libraries that implement BLAS and LAPACK functions (e.g. one from the MKL library and one from say NetLib BLAS and LAPACK), could you not get link problems? (But if the separate BLAS and LAPACK libraries are listed on link line first, the matching functions would get selected over the functions in the MKL libs. But since there is no enforced dependency between BLAS, LAPACK and MKL, there is no guarantee that the BLAS and LAPACK libs will be listed first on the link line. That can be fixed by declaring that ordering in The best solution to this problem is to just implement, test, and document the use case where a downstream TPL is fully specified by an upstream TPL. (In this case the BLAS and LAPACK TPLs are fully specified by the upstream MKL TPL.) That will be super clean and robust. Let me know if you want to go with that option. |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
2 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
3 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
28 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
Motivation
This PR allows OneAPI/MKL with SYCL support to be enabled in Trilinos, where it wasn't before. With MKL+SYCL enabled, KokkosKernels can call MKL kernels on both host and device.
This just requires OneAPI 2024.1 or above, since before that, the SYCL libraries were not available for the intel_lp64 interface that Trilinos needs. intel_lp64 means that BLAS/LAPACK functions use int (32-bit) argument types for dimensions and strides, while the intel_ilp64 interface uses int64_t for these. ilp64 is not compatible with Teuchos BLAS and LAPACK wrappers.
BLAS::all_libs
andLAPACK::all_libs
targets just point back to MKL libraries).MKL_PROVIDES_BLAS_LAPACK
cmake variable to tell Trilinos packages (especially KokkosKernels) whether BLAS/LAPACK should be called with int or MKL_INT.@trilinos/tribits @bartlettroscoe
@trilinos/kokkos-kernels
Stakeholder Feedback
Testing
This is mostly just build system changes, not code changes. Some of the CMake is exercised by builds that use MKL on the host only. The SYCL-specific changes will be exercised soon by a nightly Trilinos performance build. I tested the changes in both cases manually.