Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detribits (ForTrilinos as a standalone library) #271

Closed
wants to merge 17 commits into from

Conversation

sethrj
Copy link
Collaborator

@sethrj sethrj commented Aug 11, 2020

This changes ForTrilinos to a standalone project that depends on Trilinos as a third-party library. Among other benefits, this will allow integration into system environments that already have Trilinos installed.

@sethrj sethrj requested a review from aprokop August 11, 2020 13:41
@sethrj
Copy link
Collaborator Author

sethrj commented Aug 11, 2020

@aprokop: can you take a look at the NOX wrappers? (I'm building against Trilinos 12.18.1.) I can't figure out how the ambiguous call error is happening, but I can't say I'm surprised given the Group overload and the number of default arguments 🙄 Thanks!

@jwillenbring FYI for our meeting in 15 minutes

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

@aprokop: can you take a look at the NOX wrappers? (I'm building against Trilinos 12.18.1.) I can't figure out how the ambiguous call error is happening, but I can't say I'm surprised given the Group overload and the number of default arguments 🙄 Thanks!

I assume that you are talking about a standalone build, and not about failing checks here? I'll take a look.

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 11, 2020

Yep, failing stand-alone build. I’ve got a spack environment under “script” that you can use if you don’t have it installed

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

I can reproduce the error:

<snip>/fortrilinos/src/fortrilinos/nox_solver_def.hpp(44): error: more than one instance of constructor "NOX::Thyra::Group::Group" matches the argument list:
            function "NOX::Thyra::Group::Group(const NOX::Thyra::Vector &, const Teuchos::RCP<const Thyra::ModelEvaluator<double>> &, const Teuchos::RCP<const Thyra::VectorBase<double>> &, const Teuchos::RCP<const Thyra::VectorBase<double>> &, const Teuchos::RCP<Thyra::VectorBase<double>> &, __nv_bool)"
            function "NOX::Thyra::Group::Group(const NOX::Thyra::Vector &, const Teuchos::RCP<const Thyra::ModelEvaluator<double>> &, const Teuchos::RCP<Thyra::LinearOpBase<double>> &, const Teuchos::RCP<const Thyra::LinearOpWithSolveFactoryBase<double>> &, const Teuchos::RCP<Thyra::PreconditionerBase<double>> &, const Teuchos::RCP<Thyra::PreconditionerFactoryBase<double>> &, const Teuchos::RCP<const Thyra::VectorBase<double>> &, const Teuchos::RCP<const Thyra::VectorBase<double>> &, const Teuchos::RCP<Thyra::VectorBase<double>> &, __nv_bool, __nv_bool, __nv_bool)"
            argument types are: (Thyra::VectorBase<double>, Teuchos::RCP<ForTrilinos::ModelEvaluator<SC, LO, GO, NO>>, Teuchos::RCP<Thyra::LinearOpBase<double>>, Teuchos::RCP<Thyra::LinearOpWithSolveFactoryBase<double>>, Teuchos::ENull, Teuchos::ENull)
          detected during instantiation of "void ForTrilinos::NOXSolver<Scalar, LocalOrdinal, GlobalOrdinal, Node>::setup(Teuchos::RCP<Teuchos::ParameterList> &) [with Scalar=SC, LocalOrdinal=LO, GlobalOrdinal=GO, Node=NO]" 
<snip>/fortrilinos/src/fortrilinos/generated/fortrilinosFORTRAN_wrap.cxx(2079): here

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

Hmm, so far this does not make any sense. The file was changed in trilinos/Trilinos@2403907 with

--- a/packages/nox/src-thyra/NOX_Thyra_Group.H
+++ b/packages/nox/src-thyra/NOX_Thyra_Group.H
@@ -109,12 +109,14 @@ namespace NOX {
           \param[in] model ModelEvaluator
           \param[in] weightVector Optional diagonal weighting vector for the model.
           \param[in] rightWeightVector Optional solution vector weighting
+          \param[in] inv_rightWeightVector Optional inverse solution vector weighting
           \param[in] rightScalingFirst Optional bool to select if right scaling should be applied before left scaling
       */
       Group(const NOX::Thyra::Vector& initialGuess,
             const Teuchos::RCP<const ::Thyra::ModelEvaluator<double> >& model,
             const Teuchos::RCP<const ::Thyra::VectorBase<double> >& weightVector = Teuchos::null,
             const Teuchos::RCP<const ::Thyra::VectorBase<double> >& rightWeightVector = Teuchos::null,
+            const Teuchos::RCP<::Thyra::VectorBase<double> >& inv_rightWeightVector = Teuchos::null,
             const bool rightScalingFirst = false);
 
       /** \brief Power user constructor that takes explicit linear solver objects to handle different combinations.
@@ -133,6 +135,7 @@ namespace NOX {
           \param[in] precFactory (Optional) Factory for updating the precondiitoner.  If set to Teuchos::null and there is a non-null prec_op, then the preconditioner will be updated using the model evaluator as long as the ModelEvaluator::outArgs supports W_prec.
           \param[in] weightVector (Optional) diagonal weighting vector for the model.
           \param[in] rightWeightVector Optional solution vector weighting
+          \param[in] inv_rightWeightVector Optional inverse solution vector weighting
           \param[in] rightScalingFirst Optional bool to select if right scaling should be applied before left scaling
           \param[in] updatePreconditioner Optional bool to select if the Group should auotmatically update the preconditioner matrix values between Newton iterations
           \param[in] jacobianIsEvaluated Optional bool, if true this means that the input Jacobian operator (linearOp) has been evaluated externally and is consistent with the initialGuess. In this case, the isJacobian() flag is initialized to true.
@@ -145,6 +148,7 @@ namespace NOX {
             const Teuchos::RCP< ::Thyra::PreconditionerFactoryBase<double> >& precFactory,
             const Teuchos::RCP<const ::Thyra::VectorBase<double> >& weightVector = Teuchos::null,
             const Teuchos::RCP<const ::Thyra::VectorBase<double> >& rightWeightVector = Teuchos::null,
+            const Teuchos::RCP<::Thyra::VectorBase<double> >& inv_rightWeightVector = Teuchos::null,
             const bool rightScalingFirst = false,
             const bool updatePreconditioner = true,
             const bool jacobianIsEvaluated = false);

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

I pushed a "fix". I still cannot compile on my workstation, but that's due to mpi.mod and some environment issues. It does go through Nox, though.

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

@sethrj I'm struggling to compile it with Cuda backend and the whole nvcc_wrapper. Are you able to compile it in that mode?

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 11, 2020

I haven’t tried, I’m afraid. Ive never compiled trilinos with it to be honest...

@aprokop
Copy link
Collaborator

aprokop commented Aug 11, 2020

One thing that I noticed in this PR is that somehow it does not pick up my MPI compilers, and uses stuff like /usr/bin/f95 instead. How do manage to get MPI working?

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

It links using normal mph flags and libraries instead of the compiler wrapper. I have a fix for the fortran mpi failure

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

OK @aprokop I've got the build working (though using a non-cuda-enabled trilinos). I might wait till spack/spack#17900 goes through to test with Kokkos CUDA, or do you think I should start now and look at ArborX's standalone system to see how it built?

@aprokop
Copy link
Collaborator

aprokop commented Aug 12, 2020

I tried building this PR with Trilinos built just with Serial. I'm getting linking errors

/usr/bin/ld: warning: libkokkoskernels.so.12, needed by //home/xap/local/opt/trilinos-fortrilinos-serial/lib/libtpetra.so.12, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libteuchoskokkoscompat.so.12, needed by //home/xap/local/opt/trilinos-fortrilinos-serial/lib/libtpetra.so.12, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libkokkoscontainers.so.12, needed by //home/xap/local/opt/trilinos-fortrilinos-serial/lib/libtpetra.so.12, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libteuchosparser.so.12, needed by //home/xap/local/opt/trilinos-fortrilinos-serial/lib/libteuchosparameterlist.so.12, not found (try using -rpath or -rpath-link)
//home/xap/local/opt/trilinos-fortrilinos-serial/lib/libtpetra.so.12: undefined reference to `KokkosBlas::Impl::HostBlas<double>::dot(int, double const*, int, double const*, int)'

I haven't looked at how you find the necessary libraries to put into linking. Wonder why it works for you.

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

Does your install of Trilinos have the correct install rpath?

@aprokop
Copy link
Collaborator

aprokop commented Aug 12, 2020

Does your install of Trilinos have the correct install rpath?

I'm not even sure how to check that. This is Trilinos installed from source using scripts, not Spack. Thus, if spack does something with rpath, it may not be done with independent install

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

ldd /home/xap/local/opt/trilinos-fortrilinos-serial/lib/libtpetra.so.12 and see if it shows "not found". There are a few standard cmake options for setting install rpaths, usually needed to correctly set up a tool chain (unless you want to set LD_LIBRARY_PATH).

https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling

@aprokop
Copy link
Collaborator

aprokop commented Aug 12, 2020

Yep, not found. However, this is to be expected. I do not load Trilinos, or change LD_LIBRARY_PATH. I simply point Trilinos_DIR to it, and do -DCMAKE_PREFIX_PATH="$Trilinos_DIR". I think this is a pretty reasonable approach that should work out of the box. The correct list of libraries should then be found from the installation.

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

The problem is with the installation of trilinos, not the downstream usage of it. You'll often see "not found" type errors on systems not configured with rpaths, where it expects you to manually set it or set it up via exporting LD_LIBRARY_PATH through a module file.

Shared libraries that depend on other shared libraries either need to set the rpath correctly at install time, or require the user to know where the dependencies are and manually set LD_LIBRARY_PATH. When trilinos installs libtpetra, it doesn't tell libtpetra where libkokkos is, only that it exists.

See https://tribits.org/doc/TribitsBuildReference.html#setting-install-rpath for how to modify your cmake install command to work. As it stands, your trilinos installation won't work correctly with any downstream software without manual intervention.

@aprokop
Copy link
Collaborator

aprokop commented Aug 12, 2020

The problem is with the installation of trilinos, not the downstream usage of it. You'll often see "not found" type errors on systems not configured with rpaths, where it expects you to manually set it or set it up via exporting LD_LIBRARY_PATH through a module file.

Sure. My point is that I think it's very common to not configure Trilinos with this options, and we may need to work with this fact. Many projects building against trilinos import few variables that contain the full list of Trilinos libraries. So, instead of linking just with -ltpetra, it would be -lkokkoscore ... -ltpetra for all targets.

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 12, 2020

That may be true; but usually on systems that I've seen with missing RPATHs the sysadmin will at least hardcode the RPATH into the /etc/ld.so.conf file. I don't think (in general) that the burden of a misconfigured upstream package should be on a downstream package. If users are actually working with an install of trilinos that they can build, link, and run without warnings/errors, then yes I agree ForTrilinos should be able to as well.

It would be interesting to find out if most trilinos installations are indeed like that -- because in my case (far predating spack) I've had to add extra arguments to cmake code, and then extra RPATH configuration inside of trilinos-based projects, to get installs working correctly.

Incidentally, I think the ForTrilinos CMake is linking against not just ltpetra but the whole list of dependencies exported by tribits' FindTpetraConfig.cmake file. On my machine, the link line is:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -dynamiclib -Wl,-headerpad_max_install_names -L/usr/local/spack/opt/spack/apple-clang/hwloc/d3sdlmx/lib -L/usr/local/spack/opt/spack/apple-clang/zlib/fwg66ug/lib -L/usr/local/spack/opt/spack/apple-clang/openmpi/rq44ux2/lib -o src/libfortpetra.dylib -install_name @rpath/libfortpetra.dylib src/CMakeFiles/fortpetra.dir/fortpetra/generated/fortpetra.F90.o src/CMakeFiles/fortpetra.dir/fortpetra/generated/fortpetraFORTRAN_wrap.cxx.o -L/usr/local/Cellar/gcc/10.1.0/lib/gcc/10/gcc/x86_64-apple-darwin19/10.1.0   -L/usr/local/Cellar/gcc/10.1.0/lib/gcc/10 -Wl,-rpath,/rnsdhpc/code/fortrilinos/build/src -Wl,-rpath,/usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib  src/libforteuchos.dylib  src/libforerror.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libmpi.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libtpetraext.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libtpetra.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libkokkostsqr.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libtpetraclassiclinalg.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libtpetraclassicnodeapi.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libtpetraclassic.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libkokkoskernels.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchoskokkoscomm.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchoskokkoscompat.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchosremainder.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchosnumerics.12.18.1.dylib  /usr/local/spack/opt/spack/apple-clang/openblas/gkdwhwe/lib/libopenblas.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchoscomm.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchosparameterlist.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchosparser.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libteuchoscore.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libkokkosalgorithms.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libkokkoscontainers.12.18.1.dylib  /usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/lib/libkokkoscore.12.18.1.dylib  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libdl.tbd  -lgfortran  -lgcc_ext.10.5  -lgcc  -lquadmath  -lm  -lgcc_ext.10.5  -lgcc

although since rpaths are for runtime and this is at link time, not quite the same as the problem with missing shared libraries.

@sethrj
Copy link
Collaborator Author

sethrj commented Aug 16, 2020

Hey @aprokop , I've got everything working and tested except for a failing anasazi solve in the 4-processor eigen_handle tests (both C++ and Fortran wrapper):

p=0: *** Caught standard std::exception of type 'std::logic_error' :

 ../src/fortrilinos_hl/eigen_handle.cpp:140:

 Throw number = 1

 Throw test that evaluated to true: !(r == 0)

 Error!

The failing line is after

    Anasazi::ReturnType r = solver_->solve();

I tried updating the MueLu solver verbosity to see if that would help shed light on the error, but no luck there.

sethrj and others added 8 commits August 25, 2020 17:26
```
In file included from ../src/fortrilinos/generated/fortrilinosFORTRAN_wrap.cxx:613:
In file included from ../src/fortrilinos/nox_solver.hpp:39:
../src/fortrilinos/nox_solver_def.hpp:44:29: error: call to constructor of 'NOX::Thyra::Group' is ambiguous
        nox_group = rcp(new NOX::Thyra::Group(*(model_->initial_guess), model_,
                            ^                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/fortrilinos/generated/fortrilinosFORTRAN_wrap.cxx:2079:15: note: in instantiation of member function 'ForTrilinos::NOXSolver<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::setup' requested here
      (arg1)->setup(*arg2);
              ^
/usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/include/NOX_Thyra_Group.H:115:7: note: candidate constructor
      Group(const NOX::Thyra::Vector& initialGuess,
      ^
/usr/local/spack/var/spack/environments/fortrilinos/.spack-env/view/include/NOX_Thyra_Group.H:143:7: note: candidate constructor
      Group(const NOX::Thyra::Vector& initialGuess,
      ^
```
@sethrj
Copy link
Collaborator Author

sethrj commented Aug 25, 2020

@aprokop The commit 0a60a5a has a docker/new/Dockerfile with an installation of Trilinos that correctly builds ForTrilinos except for the failing unit test. I couldn't find the rhea script you referenced, but the ones I found in the codebase are very similar to the one in the new dockerfile.

This was referenced Aug 29, 2020
@aprokop
Copy link
Collaborator

aprokop commented Aug 31, 2020

Closing this as its functionality was split into two separate prs.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants