Parallel Solvers

From Interactive System for Ice sheet Simulation
Jump to: navigation, search

CISM now contains a compile-time mechanism for changing the sparse matrix solver used in the higher-order diagnostic scheme, the thickness evolution, and the 3D lithosphere temperature model. By default, the incomplete LU-preconditioned biconjugate gradient solver from SLAP is used. The alternatives presently include

  • UMFPACK, a direct solver that uses the unsymmetric multifrontal method.
  • PARDISO, for solving large sparse symmetric and unsymmetric linear systems of equations on shared memory multiprocessors. This code is still considered developmental.

Work is progressing on compile-time options to use PETSc suite of preconditioners sparse solvers.

In addition to providing parallelism in what can be the most time-consuming portion of the simulation (especially when higher-order diagnostics are enabled), the addition of several sparse solver options allows for the use of alternate iterative methods of of direct solvers as a "sanity check". The availability of this option has aided immeasurably our efforts to debug nonlinear iterations.

Contents

End-user documentation

To use UMFPACK instead of SLAP, configure with the --with-umfpack=path-to-umfpack-libraries. This requires an object file containing the Fortran bindings for UMFPACK to be compiled and placed in the same directory as the library, as this is not included automatically with the UMFPACK build.

Developer documentation

Although Fortran 90 does not support an object-oriented style of programming, we have designed the sparse matrix solving library within CISM using several insights from the paradigm. For instance, following the principle of encapsulation the interface used to solve a sparse matrix is separated from the internal details of the solver being used. There are many places in CISM where a sparse matrix needs to be solved: thickness evolution, higher-order velocity fields, and lithosphere temperature models, to name a few. We have written or re-designed these routines so as to be agnostic regarding the method used to solve the sparse matrix. However, the state of numerical sparse matrix solving is somewhat Babel-like: no two libraries conform to nearly the same interface!

One solution to the problem would have been to pepper the code with preprocessor or runtime conditionals selecting the sparse solver to run. However, this would have quickly led to messy and unreadable code, and would present a combinatoric problem as both more solvers and more routines requiring a sparse matrix solve are added. In order to reconcile the desire to support multiple solvers with the desire to maintain a clean code base we have applied the Adapter pattern (Gamma et. al. 1996) and defined an abstract sparse solver interface that code requiring a sparse solve can call and code providing a sparse solver can implement.

Developers interested in integrating CISM with their sparse matrix solving library of choice must be aware that the interface we have defined is a contract that all sparse solver adapters must fulfill. If client code relies on a feature that is not implemented, it will lead to either a compile-time or run-time error.

Module Design

In order to add a new sparse matrix solver to Glimmer, create a new module file name glimmer_sparse_solvername.F90. The module contained in this file must be named glimmer_sparse_solver in order for it to be swappable with the SLAP BCG solver. The module must provide a number of datatypes and functions; glimmer_sparse_slap provides detailed documentation on the facilities that the module needs to expose and is a good template to follow.

Data Types

The sparse matrix type used by a sparse solver is the type defined in glimmer_sparse.F90; sparse solvers do not need to and should not implement their own sparse matrix storage scheme. This type supports construction of a sparse matrix in triad format, consisting of three arrays that specify the row, column, and value of a sparse matrix entry. The type also supports conversion to compressed column (a.k.a Boeing-Harwell) or compressed row (transpose of compressed column) format. If the sparse solver being integrated requires a different format, a new routine will need to be implemented.

Each sparse matrix solver defines two different derived types: sparse_solver_workspace and sparse_solver_options. Sparse_solver_options is a structure that incorporates any configuration that needs to be passed to the solver. Anticipating the need to generally support iterative solvers in addition to direct solvers, sparse_solver_options must at least provide fields named maxiters and tolerance to store the maximum iterations and the error tolerance for iterative solvers. Direct solvers may feel free to ignore these fields but must still provide them. Additional solver-specific options may be added to this field, and client code can use them after checking whether the preprocessor definition SPARSE_SOLVER contains the solver that the options are specific to.

Most sparse solvers require internal workspace to be managed, typically through either a handle to the workspace (as is the case in UMFPACK or PARDISO) or by providing allocated memory of sufficient size (as is the case in SLAP). Sparse solvers may also support or require pre-processing on the matrix before solving. To support these cases, a CISM sparse solver wrapper must include a derived data type called sparse_solver_workspace. Client code must treat this workspace as entirely opaque, declaring it and passing it to the sparse solver routines without examining or modifying its internals. In this way, it can be thought of as analogous to the private member variables used in object-oriented design. Even if the sparse solver being wrapped does not require any sort of workspace, an empty derived type must be declared as client code will depend on the presence of this derived type.

Functions

For efficiency, the sparse matrix adapter is designed with three tiers of operations. This allows client code to avoid repeated operations such as memory allocation and de-allocation, sparse format conversion, matrix factorization, etc. where possible. This leads to the following pair of operations:

Because every sparse solver's library is different, not all of the calls listed will be needed. If a call is superfluous, the library must still implement it because client code will be written assuming that the function exists.

sparse_allocate_workspace/sparse_deallocate_workspace: Client code can call the allocate functions as soon as the the size of the matrix (both the order and an upper bound on the number of nonzeros) are known. The deallocate function can be deferred until the memory is no longer needed. An arbitrary number of matrix solves using the workspace may proceed between these functions as long as the amount of memory needed for the solve does not exceed the amount allocated.

sparse_solver_preprocess/sparse_solver_postprocess: Client code can call the preprocess function as soon as the matrix has been fully assembled but before the right-hand side is necessarily known. Operations appropriate for the preprocess call include transforming a triad format matrix to compressed row/column format, as well as numeric factorization or preconditioning of the matrix. These are separated from the main solver routine in case the same matrix needs to be used multiple times with different right-hand sides.

Integration Information

Describe how the selection of a sparse wrapper is integrated into the build system.

PARDISO

Introduction

PARDISO stands for Parallel Sparse Direct Linear Solver. PARDISO is a software for solving large sparse symmetric and nonsymmetric linear systems of equations on shared memory multiprocessors. The algorithms in PARDISO are based on a Level-3 BLAS update and pipelining parallelism and internally use OpenMP. PARDISO is availbe for use with Fortran and C. PARDISO supports symmetric, structurally symmetirc and non-symmetric matrices. As obvious from the name PARDISO is a direct solver but also allows a combination of direct and iterative methods for better performance.

PARDISO with Glimmer

Our goal here was to replace the SLAP library used by GLIMMER to solve the systems of liner equation in sequential with PARDISO that can solve the system of equation in parallel. Because the dataset used to run the simulations can be quite large, the current partial differential equation (PDE) solver should be replaced by one that can run in parallel. We know that SLAP uses the Bbi-conjugate gradient method to solve the system of linear equations and the ILU (Incomplete LU) decomposition method for pre-conditioning. In the case of PARDISO, it is a direct parallel solver for the system of equations, which means it does not focuses on any iterative method. But, PARDISO again says that it supports a combination of direct and iterative methods in order to accelerate the linear solution process for transient simulation.[1]

when linking in PARDISO, programs are required to provide the implementation of BLAS/LAPACK via another library which is linked in as well. In order to replace SLAP with PRDISO we compiled and linked all applications and libraries with gfortran 4.3. Further, We discovered that we needed to transform the data into Compressed Sparse Row format (CSR) for use with PARDISO. Initially we thought SLAP routines already provided this functionality. However, we discovered that SLAP only provided conversions to Compressed Column format, and we would need to provide our own conversion to CSR.We wrote a subroutine to convert the data from Coordinate Format to CSR. This resulted in a new and unexpected problem. When we converted the data to CSR, initially we got identical results, but as the solver progressed throughout the simulation, the results were wrong when compared with SLAP. What we discovered was that the data from GLIMMER was only sorted by row, and within rows columns *may* be unsorted. This resulted in creating a CSR matrix that was wrong. Thankfully, SLAP already provided routines for sorting and moving arrays, so once we made calls to sort the columns within a row, we got correct results.

PARDISO Solver

A FORTRAN module named Pardiso_solver.f90 was developed which implemented the PARDISO subroutines to solve the system of equations. The main solver routine has a signature as follows:

 subroutine Pardiso_Solve ( numRowsCols, bvec, xvec, numNonZeroesA, rowsA, colsA,   &
                               valuesA, error )


  • numRowsCols = Order, num Variables\Equations
  • bvec = The 'B' part
  • xvec = The 'X' part
  • numNonZeroesA= Num of non-zeroes
  • rowsA = Rows Array
  • colsA = Column Array
  • valuesA = Values

PARDISO Solver Results

To facilitate easy debugging, Visual Studio was used to step through GLIMMER. However, the PARDISO implementation for Windows is not provided by PARDISO.org but rather it is included as part of Intel’s Math Kernel Library (MKL). PARDISO supports two kinds of matrixes: symmetric and unsymmetric. The equations provided by GLIMMER fall under the ‘unsymmetric category’. Running under Linux with the PARDISO library provided by PARDISO.org yielded the correct results. Finally, some profiling work performed on GLIMMER for our capstone determined that the time spent in SLAP was small fraction of GLIMMER’s overall running time – often < 1%. Based on the data we would expect to only see minor – if any – performance improvements. Below is a table of runtimes of GLIMMER using SLAP and PARDISO on a 120,000 year simulation (fenscan.config).

SLAP and PARDISO run summary
Run# SLAP PARDISO
1 13481.62 13505.25
2 13512.33 13624.46
3 13486.69 13462.77
Total 40480.64 40592.48
Average 13493.55 13530.83

Conclusion

As the results show, PARDISO shows no overall improvement over SLAP when used with GLIMMER. One item of note, kernel times in PARDISO were 3x as long as those in SLAP (~100sec more), likely due to threading overhead and synchronization. Whatever improvements were achieved by solving the equations in parallel is negated by the increase in kernel time. Additionally, if only 1% of total runtime is devoted to SLAP, then only 135 seconds on average of total runtime is used solving equations. Some further optimizations to our Fortran could be made to achieve better results. For example, our routine to convert from coordinate format to CSR format iterates over the data set twice – when once would be enough. Also, the sort routine provided is not implemented in parallel.

PETSC

Introduction

PETSc is a suite of data structure and routines. "PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, and C++" [2]. Each package of library in PETSc manipulates a particular family of objects (for instance, vectors) and the operations one would like to perform on the objects [2].

Features

  • Many (parallel) vector/array operations
  • Numerous (parallel) matrix formats and operations
  • Numerous linear solvers
  • Nonlinear solvers
  • Preconditioners
  • Limited parallel grid/data management
  • Common interface for most ODE solver software

Each module consists of a set of calling sequences and one or more implementations using particular data structures. PETSc, enables easy comparison and use of different algorithms for example different preconditioners and solvers[2].

Major Components

Major four components of PETSc,

  • Vectors (Vec*)
  • Matrices (Mat*)
  • Krylov Subspace Methods (KSP*)
  • Preconditioners (PC*)

Data Types

Data Types in PETSc

  • PETSc has its own data types: PetscInt, PetscReal, PetscScalar.
  • Usually aliases of normal data types: PetscInt ~ int, PetscReal ~ double

Data Structures

Data structures in PETSc

  • parallel vectors (sequential, MPI)
  • Sparse matrices (compressed sparse row, etc.)
  • distributed arrays
  • index sets

Matrices

Matrices may be global or local. PETSc provides a variety of matrix implementations.

  • Row compressed (AIJ)
  • Block row compressed (BAIJ)
  • Symmetric block row compressed
  • Block diagonal
  • Dense

Some basic actions on PETSC matrices are: creation of a particular type of matrix, insertion of values into it, processing of the matrix, use of the matrix for various computations, and finally destroy the matrix.

  • MatCreate() : create matrix
  • MatSetValues()

The default matrix representation within PETSc is the general sparse AIJ format (also called the Yale sparse matrix format or compressed sparse row format, CSR).

KSP Solver

KSP : Linear Equation Solvers The object KSP is the heart of PETSc. KSP (Kyrlove Subspace) "is intended for solving nonsingular systems of the form Ax = b, (4.1) where A denotes the matrix representation of a linear operator, b is the right-hand-side vector, and x is the solution vector" [2]. While using KSP particular methods and preconditioners can be passed as parameters from the command line. "The combination of a Krylov subspace method and a preconditioner is at the center of most modern numerical codes for the iterative solution of linear systems" [2].

Preconditioner

Krylov space methods are typically used in conjunction with a preconditioner. To use particular preconditioner a user can supply it as a command line parameter or code it inside the program.

Making and Running PETSc programs

Installing PETSc

PETSc can be installed and used in Cygwin environment in windows. In UNIX environment following steps can be taken:

  • Download latest PETSc release tarball: petsc-2.3.3.tar.gz and unzip it.
 gunzip -c petsc-2.3.3.tar.gz | tar -xof
  • Export the petsc dir and define the environment variable PETSC_DIR.
PETSC_DIR=$PWD; export PETSC_DIR
  • Configure PETSc. Here Petsc is configured to build with with gcc, g77, blas, lapack and MPI.
./config/configure.py --with-cc=gcc --with-fc=g77 --download-f-blas-lapack=1 --download-mpich=1
  • Run Make and make test

Compiling PETSc

Petsc compile and link lines are very long. The commands required to compile, link and load a program with PETSc are complicated. It is best to use a makefile for this purpose, in which case most of the complications can be hidden. A sample petsc program compile command can look like

ex1: ex1.o

-${FLINKER} -o ex1 ex1.o ${PETSC_FORTRAN_LIB} ${PETSC_LIB}

To use this makefile, one simply types

 make ex1

and the executable will be created.

Running PETSc

The mpirun command can be used to run PETSc programs.

mpirun -np 3 petscprog <runtime options>

Glimmmer and PETSc

SLAP and PETSc

SLAP is a sqeuntial linear algebra packaged used to solve the system of linear equations in glimmer. The goal here was to replace sequential SLAP solver with parallel PETSC solver (Portable Extensible Toolkit for Scientific Calculation). PETSc uses MPI and runs in multicomputer. SLAP uses bi-conjugate gradient method to solve the equation whereas for PETSc the idea was to make a solver that can take a number of solver from command line or from the glimmer config file.

Glimmer build with PETSC

In the course of integrating PETSc with glimmer, the first step was to build glimmer with PETSc. Glimmer with PETSc can be build with the following configuration:

FC=mpif90 ./configure --with-netcdf=/installdir/netcdf_install/ 
--with-blas=/installdir/BLAS/ --with-lapack=/installdir/lapack --with-petsc=/installdir/petsc_install/


PETSc Solver

A fortran module named Petsc_solver.F90 was developed which implements PETSc subroutines to solve the system of linear equations for Glimmer. Currently the solver can take different solver methods and preconditioner as command line parameters and solve the system with those parameters.

The interface to the solve routine follows:

subroutine Petsc_Solve( orderA, bvec, xvec, numNonZeroesA, rowsA, colsA, valuesA, storageMethod, maxIterations,iterations, errorEstimate, retCode)

Glimmer Code refactoring

A minimal amount of code refactoring was done to incorporate the PETSc solver module into glimmer. However, when running the programs under MPI it is neccassary to modify the driver programs in such a way as only process 0 opens the config file and writes updates to the log and .nc files. Additionally the start, end, and increment values must be distributed to all running processes.

PETSc Solver results

Currently the PETSc solver module is being tested and different results are being logged. Once the testing is over the result will be analysed and posted.

References