-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mesh reader/writer with ADIOS2 #2618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some preliminary comments.
Could this be divided into a 'read' PR, and a checkpoint PR after?
No, as this reads the mesh from the mesh format this PR creates a writer for |
Changes have been made according to your suggestions, any further comments? |
@garth-wells Any further comments? |
@jorgensd could you make clearer in the PR comment what this changes does in practice? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments.
Would be helpful to have clear statement in the PR of what functionality is being add.
cpp/dolfinx/io/ADIOS2Writers.h
Outdated
int(geometry.cmaps()[0].variant())); | ||
} | ||
|
||
/// @brief Create a ADIOS2 checkpointer for reading a mesh from file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a class? Would a function suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, it depends on how we want to do it when adding functions to the mix. have stand-alone functions for each of the operations, or having them interact in a class.
The code can write meshes to file on N processes, and read it back in on M processes. This is the first step towards having full checkpointing, as a next PR would add in writing/reading functions. This is the reason for making the checkpointer a class, as one would write the mesh and function to the same file (and keep it alive during the process), avoiding duplicate initialization statements. What differs in this code from the already existing Fides/VTXWriter is that:
|
What's a 'global array'? An ADIOS2 concept? The VTX files we write do have a local-to-global map array.
Do we want to guarantee that the DOLFINx remains unchanged? An advantage of VTK ordering is that we can freely change the internal DOLFINx ordering. |
Yes. Consider writing out a mesh with the VTXWriter in serial: root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# ./build-dir-real64/demo_poisson
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# bpls -l mesh.bp/
uint32_t NumberOfCells {1} = 1024 / 1024
uint32_t NumberOfNodes {1} = 561 / 561
int64_t connectivity [1]*{1024, 4} = 0 / 560
double geometry [1]*{561, 3} = 0 / 2
double step scalar = 0
uint32_t types scalar = 69
uint8_t vtkGhostType [1]*{561} = 0 / 0
int64_t vtkOriginalPointIds [1]*{561} = 0 / 560 is what you get in serial, but in parallel you get root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# mpirun -n 4 ./build-dir-real64/demo_poisson
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# bpls -l mesh.bp/
uint32_t NumberOfCells {4} = 272 / 288
uint32_t NumberOfNodes {4} = 166 / 182
int64_t connectivity [4]*{__, 4} = 0 / 181
double geometry [4]*{__, 3} = 0 / 2
double step scalar = 0
uint32_t types scalar = 69 which means that the data is split into block structures (equal to the number of processors). This makes the data very fast to write, but very hard to read on any other number of processes than the one that wrote it. A global array is what we are creating when we specify both local and global dimensions in adios2::Variable<T> local_geometry = impl_adios2::define_variable<T>(
io, "geometry", {num_xdofs_global, gdim}, {local_range[0], 0},
{num_xdofs_local, gdim}); which gives you a root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/checkpoint# mpirun -n 4 ./build-dir-real64/demo_interpolation-io
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/checkpoint# bpls -l test.bp/
uint32_t CellPermutations {12} = 0 / 6
int64_t connectivity {12, 3} = 0 / 11
float geometry {12, 2} = 0 / 1
double step scalar = 2 |
} | ||
|
||
/// @brief Move constructor | ||
Checkpointer(Checkpointer&& writer) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noexcept?
~Checkpointer() { close(); } | ||
|
||
/// @brief Move assignment | ||
Checkpointer& operator=(Checkpointer&& writer) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noexcept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checkpointing needs a good high-level plan and discussion before merging. We need a summary of what we want to be able to do to inform the design. Parts of this PR mirror the VTX and Fides writers, but these designs are probably not ideal for different types.
When reading a file we don't know a priori what the types are. We need to able to (i) query this, or (ii) maybe return a std::variant
.
std::span<const T> x = geometry.x(); | ||
std::size_t gdim = geometry.dim(); | ||
|
||
// Trim geometry to gdim and trunctate for owned entities only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unusual for a 'checkpoint', which I would expect would align closely with the internal storage.
auto geometry_dmap = geometry.dofmap(); | ||
|
||
// Convert local geometry to global geometry | ||
std::vector<std::int64_t> connectivity_out(num_cells_local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminology here is confusing. The variable name includes 'connectivity', but that data that gets copied in is geometry data.
engine.Put<std::int64_t>(local_topology, connectivity_out.data()); | ||
|
||
// Compute mesh permutations and store them to file. These are only | ||
// needed for when we want to checkpoint functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'when we want to checkpoint fem::Functions.'
|
||
/// Read and write mesh::Mesh objects with ADIOS2 | ||
template <std::floating_point T> | ||
class Checkpointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem logical that this class write and read - beyond the ADIOS2 handles the member data and methods are not shared across read and write. And the ADIOS2Writer
class seems to already handle the ADISO2 handles.
std::unique_ptr<adios2::IO> _io; | ||
std::unique_ptr<adios2::Engine> _engine; | ||
std::shared_ptr<const mesh::Mesh<T>> _mesh; | ||
U _u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
This is the first step towards having function check-pointing in DOLFINx.
The code uses
ADIOS2
as backend for reading and writing data.No MPI communication required in either stage (until mesh creation/partitioning).
The following C++ code runs nicely
It would be good to get feedback on this design prior to implementing the python interface.
Once this is merged, I can start working on the function checkpointing, with the approach used in https://github.com/jorgensd/adios4dolfinx/blob/main/src/adios4dolfinx/checkpointing.py#L91-L173