Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParallelUtils::SetNumThreads is being ignored in MKL #10183

Open
ddiezrod opened this issue Aug 23, 2022 · 33 comments
Open

ParallelUtils::SetNumThreads is being ignored in MKL #10183

ddiezrod opened this issue Aug 23, 2022 · 33 comments
Assignees
Labels
MKL Parallel-SMP Shared memory parallelism with OpenMP or C++ Threads

Comments

@ddiezrod
Copy link
Contributor

ddiezrod commented Aug 23, 2022

Description
I found that even if I call ParallelUtils::SetNumThreads with NumThreads=1, pardiso is still running with more threads. It only works if I manually set OMP_NUM_THREADS=1 before launching the solver. This is quite bad for us in @KratosMultiphysics/altair as the license cost is proportional to the number of threads that are being used, so we have to solve this quickly.

Any idea on how to solve this? @RiccardoRossi @philbucher @roigcarlo @pooyan-dadvand

@ddiezrod ddiezrod self-assigned this Aug 23, 2022
@ddiezrod ddiezrod added Parallel-SMP Shared memory parallelism with OpenMP or C++ Threads MKL labels Aug 23, 2022
@loumalouomega
Copy link
Member

We should update to the last version of MKL, as since 2020 they have changed many things in MKL

@roigcarlo
Copy link
Member

We may need to set the number of threads with mkl_set_num_threads for mkl. That's one of the reasons why we always recommend not to touch the number of threads from inside the code...

I have no pardiso so I cannot test, but the changes should be the following:

  • Add a directive to check if MKL is being used from CMakeList.txt in Kratos ROOT. This is important because right now only LinearSolversApplication checks this, and we will need it in the Core. I would assume I have access to KRATOS_USING_MKL from now on.

  • In the ParallelUtilities.cpp:

// System includes
#include <algorithm>
#include <cstdlib> // for std::getenv

// External includes
#ifdef KRATOS_USING_MKL
    #include "mkl.h"
#endif

// Project includes
#include "parallel_utilities.h"
#include "input_output/logger.h"
#include "includes/lock_object.h"

and

void ParallelUtilities::SetNumThreads(const int NumThreads)
{
    KRATOS_ERROR_IF(NumThreads <= 0) << "Attempting to set NumThreads to <= 0. This is not allowed" << std::endl;

#ifdef KRATOS_SMP_NONE
    // do nothing if is shared memory parallelization is disabled
    return;
#endif

    const int num_procs = GetNumProcs();
    KRATOS_WARNING_IF("ParallelUtilities", NumThreads > num_procs) << "The number of requested threads (" << NumThreads << ") exceeds the number of available threads (" << num_procs << ")!" << std::endl;
    GetNumberOfThreads() = NumThreads;

#ifdef KRATOS_SMP_OPENMP
    // external libraries included in Kratos still use OpenMP (such as AMGCL)
    // this makes sure that they use the same number of threads as Kratos itself.
    omp_set_num_threads(NumThreads);
#endif

#ifdef KRATOS_USING_MKL
    mkl_set_num_threads(NumThreads);
#endif
}

If this does not work tell me.

@loumalouomega
Copy link
Member

Nice!

@loumalouomega
Copy link
Member

We may need to set the number of threads with mkl_set_num_threads for mkl. That's one of the reasons why we always recommend not to touch the number of threads from inside the code...

I have no pardiso so I cannot test, but the changes should be the following:

* Add a directive to check if MKL is being used from CMakeList.txt in Kratos **ROOT**. This is important because right now only LinearSolversApplication checks this, and we will need it in the Core. I would assume I have access to `KRATOS_USING_MKL` from now on.

* In the `ParallelUtilities.cpp`:
// System includes
#include <algorithm>
#include <cstdlib> // for std::getenv

// External includes
#ifdef KRATOS_USING_MKL
    #include "mkl.h"
#endif

// Project includes
#include "parallel_utilities.h"
#include "input_output/logger.h"
#include "includes/lock_object.h"

and

void ParallelUtilities::SetNumThreads(const int NumThreads)
{
    KRATOS_ERROR_IF(NumThreads <= 0) << "Attempting to set NumThreads to <= 0. This is not allowed" << std::endl;

#ifdef KRATOS_SMP_NONE
    // do nothing if is shared memory parallelization is disabled
    return;
#endif

    const int num_procs = GetNumProcs();
    KRATOS_WARNING_IF("ParallelUtilities", NumThreads > num_procs) << "The number of requested threads (" << NumThreads << ") exceeds the number of available threads (" << num_procs << ")!" << std::endl;
    GetNumberOfThreads() = NumThreads;

#ifdef KRATOS_SMP_OPENMP
    // external libraries included in Kratos still use OpenMP (such as AMGCL)
    // this makes sure that they use the same number of threads as Kratos itself.
    omp_set_num_threads(NumThreads);
#endif

#ifdef KRATOS_USING_MKL
    mkl_set_num_threads(NumThreads);
#endif
}

If this does not work tell me.

I would consider a different value for MKL, as depending on the AMD processor, the multithreading doesn't work. Personal experience with a Ryzen

@roigcarlo
Copy link
Member

No, this interface has no logic behind. We are not responsible of checking if the value makes sense. If you want to add logic to determine the best number it has to be done in the user side.

@matekelemen
Copy link
Contributor

I'm guessing ParallelUtilities::InitializeNumberOfThreads should also invoke ParallelUtilities::SetNumThreads instead of setting the number of threads by itself directly.

@roigcarlo
Copy link
Member

I'm guessing ParallelUtilities::InitializeNumberOfThreads should also invoke ParallelUtilities::SetNumThreads instead of setting the number of threads by itself directly.

Yep, did not remember about that. Also probably you have to parse MKL_NUM_THREADS as input variable if we pretend to be 100% conform with mkl...

@matekelemen
Copy link
Contributor

Also probably you have to parse MKL_NUM_THREADS as input variable

argh that's nasty. What happens if both are defined?

@roigcarlo
Copy link
Member

I suppose that

  • MKL_NUM_THREADS is set to mkl_set_num_threads
  • OMP_NUM_THREADS is set to omp_set_num_threads

And if KRATOS_NUM_THREADS it takes preference over both?

@matekelemen
Copy link
Contributor

I suppose that

* `MKL_NUM_THREADS` is set to `mkl_set_num_threads`

* `OMP_NUM_THREADS` is set to `omp_set_num_threads`

And if KRATOS_NUM_THREADS it takes preference over both?

Fine by me, provided that mkl_set_num_threads and omp_set_num_threads are completely orthogonal. That said, I have a hunch that mkl_set_num_threads manipulates OMP as well but I'm not sure about that.

@philbucher
Copy link
Member

I will think abt this some more, but a big blocker with including this in the core is that if Kratos is compiled with MKL then we need to distribute it always with it

The same problem with MPI lead to the creation of the MPICore

Different proposal: when the LinearSolversApp is compiled with MKL it could register a lambda fct in the Core which is called in SetNumThreads (and internally can call mkl_set...)
This way it would work while being independent in terms of compilation

@mpentek
Copy link
Member

mpentek commented Aug 23, 2022

Not on the technical side, but isn’t Intel MKL free to use even for commercial purposes? The whole Intel oneAPI? Or am I totally off topic…

@roigcarlo
Copy link
Member

@philbucher Yea that would work, but even if we do it inside the core, it would not be a problem if its included like the other TPL.

@mpentek Is free to use and redistribute, but is not compatible with BDS, they have their own ISSL licence. As with tetgen, we can distribute but the problem is for people wanting to include Kratos in their own projects.

@mpentek
Copy link
Member

mpentek commented Aug 23, 2022

@roigcarlo I only partially understand. The argumentation is perhaps along these lines: daanzu/kaldi-active-grammar#47.

I assume Altair has a specific issue due to including Kratos (with a BSD) license in their own project, where additionally including MKL (with ISSL) starts to become conflicting. Without lengthening this thread unnecessarily, I understand this does not affect using Kratos and MKL for our typical purposes, where we do not use these as part of another project with additional licensing requirements.

@roigcarlo
Copy link
Member

assume Altair has a specific issue due to including Kratos (with a BSD) license in their own project, where additionally including MKL (with ISSL) starts to become conflicting

I don't know the altair case in particular, but I think is no problem for them (they want to use it, they simply want to be able to set the threads correctly), but as a rule of thumb we cannot include in our code other codes (or parts of it) that are under licences not compatible with BDS without at least letting the user to decide if they want that code to be compiled or not. Essentially is what the discussion says. This is were the macro enters.

That being said, we are hopping to release Kratos with Intel libs ourselves, so is not that we are against or anyting. We just want to make sure that you can still use Kratos in its BSD form.

@RiccardoRossi has suggested me to look into the pardiso interface so maybe we can solve this without touching the core. I will see if its possible tomorrow

@ddiezrod
Copy link
Contributor Author

@roigcarlo I tried your proposal but I am getting a linking error:

   Creating library kratos\kratos\KratosCore.lib and object kratos\kratos\KratosCore.exp
parallel_utilities.cpp.obj : error LNK2019: unresolved external symbol MKL_Set_Num_Threads referenced in function "public: static void __cdecl Kratos::ParallelUtilities::SetNumThreads(int)" (?SetNumThreads@ParallelUtilities@Kratos@@SAXH@Z)
kratos\kratos\KratosCore.dll : fatal error LNK1120: 1 unresolved externals

@roigcarlo
Copy link
Member

It's lacking ${MKL_RT_LIB} link in the core most likely.

@KratosMultiphysics/technical-committee At this point I would vote for enabling one-api directly in the core. We alreade have several users that need that, and we are even moving towards that ourselves (see #9912 for example for TBB and MKL...)

@loumalouomega
Copy link
Member

It's lacking ${MKL_RT_LIB} link in the core most likely.

@KratosMultiphysics/technical-committee At this point I would vote for enabling one-api directly in the core. We alreade have several users that need that, and we are even moving towards that ourselves (see #9912 for example for TBB and MKL...)

That's why I added: #10185

@loumalouomega
Copy link
Member

It's lacking ${MKL_RT_LIB} link in the core most likely.

@KratosMultiphysics/technical-committee At this point I would vote for enabling one-api directly in the core. We alreade have several users that need that, and we are even moving towards that ourselves (see #9912 for example for TBB and MKL...)

I don't like to depend on Intel...

@roigcarlo
Copy link
Member

@ddiezrod In the meantime try to force call that in the solver interfaces in applications/LinearSolversApplication/custom_solvers/eigen_pardiso_**** that should not give you linking problems.

@loumalouomega @philbucher I am open to other solutions. How do you suggest to approach this. I think that no one here hates intel more than I do, but please bear in mind that in windows is virtually impossible to run 1M+ element cases without TBB+MKL. (We are speaking of days vs minutes)

@matekelemen
Copy link
Contributor

matekelemen commented Aug 24, 2022

I think that no one here hates intel more than I do

I know it's absolutely proprietary™, but apart from that what's the problem with Intel and its software? (not a rhetorical question, I haven't had much experience with them yet)

@loumalouomega
Copy link
Member

loumalouomega commented Aug 24, 2022

@ddiezrod In the meantime try to force call that in the solver interfaces in applications/LinearSolversApplication/custom_solvers/eigen_pardiso_**** that should not give you linking problems.

@loumalouomega @philbucher I am open to other solutions. How do you suggest to approach this. I think that no one here hates intel more than I do, but please bear in mind that in windows is virtually impossible to run 1M+ element cases without TBB+MKL. (We are speaking of days vs minutes)

Windows!!!

image

@ddiezrod
Copy link
Contributor Author

@roigcarlo I tried adding that call in eigen_direct_solver (it seems eigen_pardiso_lu was not actually being called) and it behaves as it should, using only the number of threads I assigned there. (I did not need to set MKL_NUM_THREADS in the env)

@RiccardoRossi
Copy link
Member

i am not positive of having a strong dependency on blas in the core. even less so with a proprietary implementation (mkl)

@ddiezrod
Copy link
Contributor Author

Just to confirm that if I add this block

if( USE_EIGEN_MKL MATCHES ON )
    add_definitions(-DKRATOS_USING_MKL)
    if( ${CMAKE_SYSTEM_NAME} MATCHES "Windows" )
        find_library(MKL_RT_LIB mkl_rt)
        if(NOT MKL_RT_LIB)
            message( FATAL_ERROR "mkl_rt.lib not found")
        else(NOT MKL_RT_LIB)
            message( "mkl_rt.lib found at: ${MKL_RT_LIB}")
            target_link_libraries( KratosCore PUBLIC ${MKL_RT_LIB} )
        endif(NOT MKL_RT_LIB)
    elseif( ${CMAKE_CXX_COMPILER_ID} MATCHES Clang )
        message( FATAL_ERROR "Clang does not yet support MKL" )
    else( ${CMAKE_SYSTEM_NAME} MATCHES "Windows" )
        target_link_libraries( KratosCore PUBLIC mkl_rt )
        if( USE_EIGEN_FEAST MATCHES ON )
            target_link_libraries(KratosCore PUBLIC feast4 gfortran m)
        endif()
    endif( ${CMAKE_SYSTEM_NAME} MATCHES "Windows" )
endif()

in kratos CMakeLists, things work fine in the original design proposed by @roigcarlo

@ddiezrod
Copy link
Contributor Author

I am not sure I get your points @RiccardoRossi @philbucher , why do you not like the first solution @roigcarlo proposed here? I do not see why that would mean adding an external dependency. If -DUSE_EIGEN_MKL is off everything works exactly as it used to.

@RiccardoRossi
Copy link
Member

RiccardoRossi commented Aug 30, 2022 via email

@roigcarlo
Copy link
Member

roigcarlo commented Aug 30, 2022

I don't understand what anyone means by including MKL in the core. The situation as I see it is as follows:

Currently we are able to compile the LinearSolversApp with MKL. The support for MKL is provided at eigensolvers library level, and for what it matters, from Kratos we have 0 knowledge of what features of MKL are being used or not inside said lib. It appears that one of such features is the usage of intel's own implementation of OMP (KOMP).

From Kratos we have an abstraction level to select the backend we want to use for SMP. So far there is full support for GOMP, and partial support for std::execution::par. Some features that Kratos allows are highly tied to the implementation. For instance omp_set_num_threads() is only relevant if used with GNU's OpenMP (GOMP), and has to be invoked differently for MKL mkl_set_num_threads() or mkl_domain_set_num_threads() and has no direct translation in std::execution.

In order to solve this, we have two choices:

  • To provide support in our LinearSolvers app wrappers to eigensolver's solvers that account for the fact that some settings from Kratos will be ignored in the lib. As I see it, this is equivalent to operations like Id's, or moving data / formats to adapt it to the solver's format.

  • Since MKL could be potentially used outside LinearSolversApp, the second solution is to add MKL as another type of backend that Kratos support.

My suggestion was to go for 2 for several different reasons:

  • First and most important. Right now using pardiso implies that we are running two OMP backends at the same time and that is... unsafe....

  • Having support for MKL does not mean integrating it into the core. We already have parallel utilities specifically for this. As a matter of fact, one can chose to compile Kratos even without OMP, which has code leftovers all around the project. One should be able to add or remove MKL at compile time provided the proper flag. Instead of having OpenMP, C++11 (which should be changed to internal or std or whatever) and None, we would also have Intel or MKL or OneAPI.

  • In the first stage, we only need to provide the calls to manipulate internal functions in MKL the logic of which already exists because of C++ vs GOMP and only needs to be extended.

  • The moment another app relies on a library that can be compiled with MKL, several versions of the lib could be linked against Kratos at the same time. By adding the possibility to access it trough the core we solve this problem. See tetgen for example.

  • We already have support for Intel compiler and MKL is the default choice for mp when compiling with it. Even in the new DPC++ compiler version based on clang. Actually, I suspect that ParalleUtilities::SetNumThreads()could be broken for us as well if compiling with Intel+KOMP.

  • I am aware that, at least in windows, OneAPI's TBB allocators are needed in order to achieve the same performance as OMP + std::allocators in linux. Until microsoft fixes its compiler, this is the only reasonable solution to run large cases in win.

  • Since we have our custom block_for etc... wrappers for parallel code, no Kratos code needs to be touch if we decide to use more complex implementations from TBB for example.

  • Even when MKL/TBB/KOMP might have an incompatible license, this is not a problem for this scenario as we don't intend to release Kratos with this component enabled, and this components is options. Even if we were to release a flavor of Kratos based purely on Intel, we could do so under BSD provided all shared libraries exists as separate packages, and those packages exists both in PIP and CONDA and are maintained by Intel.

As I see it, this only adds options for a user compiling Kratos in their own environments, while it needs virtually no effort from our side.

@RiccardoRossi
Copy link
Member

i think we are discussing in two different directions.

inagine you are u using tje core alone. you would be setting the number of threads in one way. now you add pardiso. why should the core be aware of that addition? one can set the numer of threads in mkl or in the given lib by an env var no?

@matekelemen
Copy link
Contributor

matekelemen commented Aug 31, 2022

now you add pardiso. why should the core be aware of that addition?

Think of it from the perspective of an application: ApplicationX wants to use DependencyY. Why should it know that ApplicationZ also uses DependencyY? Should it require ApplicationZ just to get access to DependencyY? Probably not. Instead, this could easily evolve into a situation @roigcarlo mentioned:

The moment another app relies on a library that can be compiled with MKL, several versions of the lib could be linked against Kratos at the same time. By adding the possibility to access it trough the core we solve this problem.

As I see it, Core should provide access/interface to common tools that multiple applications can make use of. An alternative would be that we create a new application ApplicationY whose only job is to provide access to DependencyY, and then both ApplicationX and ApplicationZ would require ApplicationY instead of having DependencyY internally.

But that doesn't work for something as essential as multithreading, because it's used in Core as well and I doubt having a different library managing threads in different parts of Kratos is a good idea.

@RiccardoRossi
Copy link
Member

i see what you say...but this will multiply the number of compilation options to smthg unmanageable.

anyhow, we ll discuss this in @KratosMultiphysics/technical-committee

@pooyan-dadvand
Copy link
Member

@ddiezrod do we need this for Pardiso only? Then we may change the Pardiso's interface.

@ddiezrod
Copy link
Contributor Author

In my case I only need it for pardiso yes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MKL Parallel-SMP Shared memory parallelism with OpenMP or C++ Threads
Projects
Status: 🆕 New
Development

No branches or pull requests

8 participants