Adjoint models

From Interactive System for Ice sheet Simulation
Jump to: navigation, search

Contents

Introduction

[Presentation pdf]

In an introductory presentation I talk about my background, and how we use adjoint models in oceanography, in particular within the Estimating the Circulation and Climate of the Ocean (ECCO) project (with an aside on sea-level) to improve our understanding of the global ocean circulation (yet another heavily under-sampled system). For a recent semi-popular overview of ECCO, see Wunsch et al. (2009) [1].

Some background: why adjoint models are good for you

Adjoint models come in very handy if you wish to compute the derivatives of a scalar-valued output function with respect to many(!) input variables, i.e. when seeking a high-dimensional gradient. Two applications that come readily to mind are

  • Sensitivity analysis - an example:
    You would like to know how Greenland or Antarctic total ice sheet volume (a scalar-valued model diagnostic) changes when perturbing basal sliding, or basal melt rate, or geothermal flux, or precipitation, or the initial temperature field at any(!) grid point in your domain. Not only do you need to compute the derivative with respect to each of these variables, but also each of these variables spans a two-or three-dimensional space. Recently, we have applied the adjoint of the three-dimensional ice sheet model SICOPOLIS to Greenland ice sheet volume sensitivities [2].
  • Optimal control, inverse modeling, parameter or state estimation, data assimilation:
    The general method here is to fit a model to a given set of observations by adjusting a set of uncertain variables (so-called control variables), which can be model parameters, surface/basal/lateral boundary conditions, or initial conditions. Often these problems are solved iteratively via gradient-based optimization, i.e. the gradient of a least-squares model vs. data misfit function is sought, and used in conjunction with gradient descent methods, such as the conjugate gradient method or Newton's method to reduce the misfit. A classic paper which introduces control methods to glaciology is by Douglas MacAyeal (1992) [3].

Both examples show that the gradient of the model is a key ingredient.
In the following we shall be concerned with

  • making clear why the adjoint (and not the tangent linear) model is what we want;
  • how to get an adjoint for a complicated model in the form of a Fortran code using automatic differentiation (AD).

A very simple example

The following simple example should help explain some very basic issues of adjoint models:


\begin{align}
\text{model} \quad L: \quad & 
\quad \mathbf{x} & 
\longmapsto & 
\quad \mathbf{y} \, = \, L (\mathbf{x}) \\
~ &
\left[ 
\begin{array}{c}
x_1 \\
x_2 \\
\end{array}
\right]
&  
\longmapsto &
\left[ 
\begin{array}{c}
y_1 \\
y_2 \\
\end{array}
\right]
\, = \,
\left[ 
\begin{array}{rr}
0 & a \\
-b & 0 \\
\end{array}
\right]
\cdot
\left[ 
\begin{array}{c}
x_1 \\
x_2 \\
\end{array}
\right]
\, = \,
\left[ 
\begin{array}{r}
a \, x_2 \\
-b \, x_1 \\
\end{array}
\right]
\\
~ & ~ & ~ & ~ & ~ \\
\text{cost function} \quad J_0: \quad &
\quad \mathbf{x} & 
\longmapsto & 
\quad J_0(\mathbf{y}) \, = \, J_0 (L(\mathbf{x})) \\
~ & ~ & ~ & \quad J_0 (\mathbf{y}) =
\frac{1}{\sigma^2_1} (y_1 - d_1)^2 \, + \, \frac{1}{\sigma^2_2} (y_2 - d_2)^2 \\
~ & ~ & ~ & \quad  J_0 (L(\mathbf{x})) =
\frac{1}{\sigma^2_1} (a x_2 - d_1)^2 \, + \, \frac{1}{\sigma^2_2} (-b x_1 - d_2)^2 \\
\end{align}

Some algebra

Lecture notes (pdf)

These notes expose how the gradient of J_0 can be computed via the tangent linear or the adjoint model. They discuss how things change when changing the control space from initial conditions \mathbf{x} = [ x_1 \quad x_2 ] to model parameters \mathbf{p} = [ a \quad b ], and that the notion of the adjoint model is ill-conceived.

The corresponding Fortran code

[Example codes]

Example codes, scripts to invoke the AD tool, and to compile are contained in this tar-file. Download, place in your home directory ~/ and un-tar via

# un-compress, un-tar in your home directory
cd ~
tar xzf Adjoint_example.tar.gz
# check simple README
more adjoint_README
# cd to simple example code, using x_1, x_2 as controls
cd ~/adjoint_example/simple_function/control_init/
# cd to simple example code, using a, b as controls
cd ~/adjoint_example/simple_function/control_param/

The simple forward model

The simple adjoint model

The simple tangent linear model

A more complex example: Kees' assignment of a mountain glacier model

Earlier in the school we developed a model for a mountain glacier (see Kees' assignment). Each of the six teams came up with a solution. From these I chose the Team 2 Solution (because it was posted on the WIKI, and because it's very compact) and used AD to generate a tangent linear (TLM) and adjoint (ADM) model.

I formulated the following control problem:

  • dependent variable / cost: The "total volume", i.e. fc = \sum H(i) \, \Delta x over all i
  • independent variable / control: a perturbation in the mass balance M at any point i

Here are codes for the slightly modified original code, the adjoint code, the tangent linear code, and the driver routine. The latter calculates the gradient of total volume with respect to changes in mass balance at each point using (1) the adjoint model, (2) the tangent linear model, and (3) via finite-difference perturbations. The f.d. calculation serves to test the results calculated via AD-generated code.

Remember that we have to run the TLM and f.d. model 31 times, corresponding to the number of grid point, i.e. the dimension of M, whereas we have to run the ADM only once (check this carefully in the driver program). Also, the f.d. model only uses the original forward code, it doesn't rely on any of the AD-generated codes.

Mountain glacier forward model (very slightly modified from the Team 2 Solution)

Mountain glacier adjoint model

Mountain glacier tangent linear model

Mountain glacier driver routine

Mountain glacier sensitivity result

Adjoint variables and Lagrange multipliers: some algebra

Adjoint methods are synonymous to what is better known in many fields as Lagrange multiplier methods. In a section on Lagrange multipliers and adjoints we spend some time to develop their relationship. A more complete treatment can be found, e.g. in Wunsch (2006) [4] or MacAyeal and Barcilon (1998) [5]. The paper by Giering and Kaminski (1998) [6] offers a perspective based purely on application of the chain rule.

A (incomplete) list of AD tools

There are quite a few AD tools around. We list a few of them which we think have significant/relevant reverse mode capabilities that are necessary to generate adjoint models. A more complete forum on automatic differentiation is the Community Portal for Automatic Differentiation autodiff.org. The authoritative textbook on AD is by Griewank and Walther (2008)[7]

  • TAF (Transformation of Algorithms in Fortran), developed by Fastopt, Hamburg, Germany
  • OpenAD (Open-Source AD tool), developed at Argonne National Laboratory, Chicago, IL
  • Tapenade, developed at INRIA, Sophia Antipolis, France

References

  1. Wunsch, C., P. Heimbach, R. Ponte, I. Fukumori and the ECCO-GODAE Consortium members, 2008: The global general circulation of the ocean estimated by the ECCO Consortium. Oceanography, 22(2), pp. 88-103 pdf
  2. Heimbach, P. and V. Bugnion, 2009: Greenland ice sheet volume sensitivity to basal, surface, and initial conditions, derived from an adjoint model. Annals of Glaciology, 50(52), pp. 67-80 pdf
  3. MacAyeal, D.R., 1993: A tutorial on the use of control methods in ice-sheet modeling, J. Glaciol. 39(131), pp. 91-98 pdf
  4. Wunsch, C., 2006: Discrete Inverse and State Estimation Problems: With Geophysical Fluid Applications. Cambridge University Press.
  5. MacAyeal, D.R. and V. Barcilon, 1998: Finding Connections Between Data and Theory: Applications in Geophysical Sciences. University of Chicago, Chicago, IL pdf
  6. Giering, R. and T. Kaminski, 1998: Recipes for adjoint code construction. ACM Transactions on Mathematical Software, 24, pp. 437-474 pdf, doi:10.1145/293686.293695
  7. Griewank, A. and A. Walther, 2008: Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation (2nd ed.). SIAM Frontiers in Applied Mathematics, Vol. 19, Philadelphia, 2008.