Skip to content

aneoconsulting/SeWaS

Repository files navigation

Windows Linux

General Description of SeWaS

This document gives an overview of SeWaS, a modern production-class Seismic Wave Simulator. This simulator is built from scratch using C++14 and makes extensive use of state-of-the-art libraries such as Eigen and Boost. In addition, it embeds advanced techniques for developing HPC applications on top of emerging architectures, such as cache-blocking and vectorization. In the following, we give a macroscopic description of the underlying algorithm, data structures and the parallelization scheme.

User Interface

  • Currently, the application requires a cubic computational domain. The domain description and mechanical parameters are defined in a JSON file as following:
{
    "Model" : "Dummy",
    "tmax" : 1.6,
    "dt" : 0.008,
    "lx" : 60000,
    "ly" : 60000,
    "lz" : 20500,
    "ds" : 100,
    "Layers" :
    [
        {
            "Name" : "Basin 1",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Z" :
                {
                    "start" : 0,
                    "end" : 300
                }
            },
            "Vp" : 600,
            "Vs" : 300,
            "rho" : 1700,
            "Q" : 70
        },
        {
            "Name" : "Basin 2",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Z" :
                {
                    "start" : 300,
                    "end" : 750
                }
            },
            "Vp" : 1870,
            "Vs" : 1000,
            "rho" : 2000,
            "Q" : 100
        },
        {
            "Name" : "Basin 3",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Z" :
                {
                    "start" : 750,
                    "end" : 1500
                }
            },
            "Vp" : 2800,
            "Vs" : 1500,
            "rho" : 2300,
            "Q" : 100
        },
        {
            "Name" : "Bedrock 1",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Z" :
                {
                    "start" : 1500,
                    "end" : 5000
                }
            },
            "Vp" : 5000,
            "Vs" : 2890,
            "rho" : 2700,
            "Q" : 200
        },
        {
            "Name" : "Bedrock 2",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 60000
                },
                "Z" :
                {
                    "start" : 5000,
                    "end" : 20000
                }
            },
            "Vp" : 6000,
            "Vs" : 3465,
            "rho" : 2700,
            "Q" : 250
        }
    ],
    "Sources" :
    [
        {
            "Name" : "A",
            "xs" : 1000,
            "ys" : 1000,
            "zs" : 1000
        }
    ]
}
  • The application is launched from command line as following:
    $ ./sewas --cx 40 --cy 40 --cz 40 --P 1 --Q 1 --R 1 --nthreads 3 --dfile=../data/input/dummy.json

where the arguments are detailed below:

Seismic Wave Simulator (SeWaS)

Allowed options:
  -h [ --help ]         Produce help message
  -x [ --cx ] arg       Block size along x-axis
  -y [ --cy ] arg       Block size along y-axis
  -z [ --cz ] arg       Block size along z-axis
  -P [ --P ] arg        Number of MPI processes along x-axis
  -Q [ --Q ] arg        Number of MPI processes along y-axis
  -R [ --R ] arg        Number of MPI processes along z-axis
  -T [ --nthreads ] arg Number of threads per MPI process
  -D [ --dfile ] arg    Path to the Json file containing mechanical properties
                        and kinematic source parameter
  -C [ --config ] arg   Configuration file
  • An alternative way to launch the application consists to use a configuration file containing all required input parameters. The previous command line can then be simplified to:
$ ./sewas -C dummy.cfg

where dummy.cfg contains:

cx=40
cy=40
cz=40
P=1
Q=1
R=1
nthreads=3
dfile=../data/input/dummy.json

Algorithm

The application implements the numerical solution of the linear elastodynamic equation. The discretization follows a staggered 4th order finite difference scheme in space and 2nd order in time. The numerical schemes considered are detailed in Dupros2010.

Data Structures

The application is developed using Eigen's containers. The spatial domain is decomposed into a collection of contiguous 3D tiles of cells. Each tile object is an instance of the SpatialBlockField class, a multi-dimensional container mapped on a 1D Eigen::Array storing cxcycz cells. In the following, we show the descriptions of the Velocity, Stress and SourceForce fields.

ii,jj,kk variables are used to represent a tile coordinates, whereas i,j,k represent a cell coordinates. The time-step is denoted ts.

Velocity

  // (x|y|z)(ii, jj, kk)(i, j, k)
  typedef Eigen::Array<SpatialField, DIM, 1, Eigen::ColMajor> Velocity;

Let us consider an object V of type Velocity. Vx=V(x)(ii,jj,kk) is a 3D spatial block of cx*cy*cz cells. Such an approach allows to perform block-wise operations:

  Vx=42.; // All elements of Vx will be initialized to 42.
  Vx+=U;  // U is another container of the same type as Vx

Alternatively, one can go through all elements of Vx as following:

  for (int i=0; i<cx; i++)
    for (int j=0; j<cy; j++)
      for (int k=0; k<cz; k++)
        Vx(i,j,k)=42.;

With this block-wise formulation, the x-component of the velocity field is computed as following:

    for (int i=iStart; i<iEnd; i++){
      for (int j=jStart; j<jEnd; j++){
        vX(i,j)+=(fdo_.apply<L2S::X>(sigmaXX, i, j, L2S::DK)
                + fdo_.apply<L2S::Y>(sigmaXY, i, j, L2S::DK)
                + fdo_.apply<L2S::Z>(sigmaXZ, i, j, L2S::DK))*dt*bx(i,j);
      }
    }

where fdo_ is an instance of the 4th order finite difference operator acting on a 1D spatial block of cells (for instance vX(i,j) is a 1D vector along z-axis).

An important point to note here is that the right-hand side, of the above mentioned velocity-update instruction, is evaluated thanks to the C++ idiom Expression Templates. Therefore, only one new SpatialBlockField1D object is created for the whole expression.

Stress

  // (xx|yy|zz|xy|xz|yz)(ii, jj, kk)(i, j, k)
  typedef Eigen::Array<SpatialField, NB_STRESS_FIELD_COMPONENTS, 1, Eigen::ColMajor> StressField;

Similarly, Sxx=S(xx)(ii,jj,kk) is a 3D spatial block of cx*cy*cz cells, and all previously listed operations are applicable.

Velocity source

// (is, js, ks) are the coordinates of a cell-source
v_(L2S::X)(ii,jj,kk)(is,js,ks)-=a*((ts-2)*dt-1.2/7.)*exp(-M_PI*M_PI*7.*7.*((ts-2)*dt-1.2/7.)*((ts-2)*dt-1.2/7.))*dt/rho(ii,jj,kk)(is,js,ks);

Parallelization Scheme

The algorithm is implemented using a task-based approach and the parallelization is handled by the PaRSEC runtime system. We defined 6 computational tasks within the dataflow:

ComputeVelocity
ExtractVelocityHalo
UpdateVelocity
ComputeStress
ExtractStressHalo
UpdateStress

Two visualization tasks are added to enable real-time monitoring of the ongoing simulation:

DisplayVelocity
DisplayStress

Those visualization tasks interact with VTK actors to render the scene.

Building the application

The application has been tested on Debian 9 and CentOS. The following steps are suitable for a Debian-like OS. Our experiments were carried out using Intel compiler 18, but according to our recent experiments using gcc 7.3 there is no significant performance differences between these two compilers. So all the following experiments can be obtained using gcc.

Dependencies

  • Boost 1.62 (program-options module)
$ sudo apt-get install libboost-program-options-dev
  • OpenMPI 2.0.2
$ sudo apt-get install libopenmpi-dev openmpi-common
  • PaRSEC 2.0
$ git clone https://bitbucket.org/icldistcomp/parsec.git
$ cd parsec
$ cmake .. -DPARSEC_WITH_DEVEL_HEADERS=1 -DPARSEC_GPU_WITH_CUDA=0
$ make install
  • Eigen 3.3.3
$ sudo apt-get install libeigen3-dev
  • Intel TBB 4.3
$ sudo apt-get install libtbb-dev

SeWaS build

$ export SEWAS_ROOT=/path/to/SeismicWaveSimulator
$ cd $SEWAS_ROOT/build
$ cmake ../src -DCMAKE_BUILD_TYPE=Release
$ make

At this stage the build/ directory should contain an executable sewas

/!\ The results presented in our paper are obtained on a cluster of Intel KNL processors. Hence it is mandatory to build the SeWaS application natively on one co-processor for best performances.

Benchmarks

Performance studies have been carried out using two test cases: TestA and TestB. As an example the TestA input file is given by:

{
    "Model" : "TestA",
    "tmax" : 0.8,
    "dt" : 0.008,
    "lx" : 20000,
    "ly" : 20000,
    "lz" : 10000,
    "ds" : 100,
    "Layers" :
    [
        {
            "Name" : "Basin 1",
            "Size" :
            {
                "X" :
                {
                    "start" : 0,
                    "end" : 20000
                },
                "Y" :
                {
                    "start" : 0,
                    "end" : 20000
                },
                "Z" :
                {
                    "start" : 0,
                    "end" : 10000
                }
            },
            "Vp" : 6000,
            "Vs" : 3460,
            "rho" : 2700
        }
    ],
    "Sources" :
    [
        {
            "Name" : "A",
            "xs" : 1000,
            "ys" : 1000,
            "zs" : 1000,
            "Depth" : 5000
        }
    ]
}

The results of the numerical computations using TestA are validated with the reference application Ondes3D. The statistics relative to this experiment, conducted on a dual-core laptop are given below.

salli@jigawal:~/Projects/HPC/SeismicWaveSimulator/build$ ./sewas -C TestA.cfg 
Source : (13, 13, 13) within tile (0, 0, 0)
Seismic wave is progressing...
I am 0 and I completed the simulation...
++++++++++++++++++++++++++++++++++++++++++
 Statistics 
------------------------------------------
 SEWAS 
++++++++++++++++++++++++++++++++++++++++++
                                AVG	        MIN	        MAX	        TOT	        SD          
Global
	#Calls           : 	1
	Elapsed Time (s) : 	1.743005e+01	1.743005e+01	1.743005e+01	1.743005e+01	0.000000e+00
	CPU Time (s)     : 	3.391434e+01	3.391434e+01	3.391434e+01	3.391434e+01	0.000000e+00
Core simulation
	#Calls           : 	1
	Elapsed Time (s) : 	1.681025e+01	1.681025e+01	1.681025e+01	1.681025e+01	0.000000e+00
	CPU Time (s)     : 	3.357347e+01	3.357347e+01	3.357347e+01	3.357347e+01	0.000000e+00
Initialization
	#Calls           : 	1
	Elapsed Time (s) : 	5.559187e-01	5.559187e-01	5.559187e-01	5.559187e-01	0.000000e+00
	CPU Time (s)     : 	2.772830e-01	2.772830e-01	2.772830e-01	2.772830e-01	0.000000e+00
ComputeStress
	#Calls           : 	9504
	Elapsed Time (s) : 	2.418401e-03	8.470220e-04	1.123556e-02	2.298448e+01	2.183366e-03
	CPU Time (s)     : 	4.808716e-03	8.600000e-04	2.409800e-02	4.570203e+01	4.384567e-03
ComputeVelocity
	#Calls           : 	4752
	Elapsed Time (s) : 	1.493992e-03	1.189589e-03	3.632833e-03	7.099449e+00	5.015644e-04
	CPU Time (s)     : 	3.005864e-03	1.190000e-03	8.809000e-03	1.428387e+01	1.283688e-03

Results reproducibility

For a distributed run, the P, Q and R parameters (defined in both TestA.cfg and TestB.cfg files) need to be changed according to the total number of MPI processes used, such that:

NB_MPI_PROCESSES=P*Q*R

We provide the best (P,Q,R) used to run our experiments on TestB (results presented in section 5 of the paper):

NB_MPI_PROCESSES P Q R
1 1 1 1
2 2 1 1
4 2 2 1
8 2 4 1

For example on 8 nodes, the application is launched as following:

mpirun -n 8 -pernode ./sewas --cx 40 --cy 40 --cz 100 --P 2 --Q 4 --R 1 --nthreads=64 --dfile=../data/input/TestB.json

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages