## Introduction

In an introductory presentation I talk about my background, and how we use adjoint models in oceanography, in particular within the Estimating the Circulation and Climate of the Ocean (ECCO) project (with an aside on sea-level) to improve our understanding of the global ocean circulation (yet another heavily under-sampled system). For a recent semi-popular overview of ECCO, see Wunsch et al. (2009) .

## Some background: why adjoint models are good for you

Adjoint models come in very handy if you wish to compute the derivatives of a scalar-valued output function with respect to many(!) input variables, i.e. when seeking a high-dimensional gradient. Two applications that come readily to mind are

• Sensitivity analysis - an example:
You would like to know how Greenland or Antarctic total ice sheet volume (a scalar-valued model diagnostic) changes when perturbing basal sliding, or basal melt rate, or geothermal flux, or precipitation, or the initial temperature field at any(!) grid point in your domain. Not only do you need to compute the derivative with respect to each of these variables, but also each of these variables spans a two-or three-dimensional space. Recently, we have applied the adjoint of the three-dimensional ice sheet model SICOPOLIS to Greenland ice sheet volume sensitivities .
• Optimal control, inverse modeling, parameter or state estimation, data assimilation:
The general method here is to fit a model to a given set of observations by adjusting a set of uncertain variables (so-called control variables), which can be model parameters, surface/basal/lateral boundary conditions, or initial conditions. Often these problems are solved iteratively via gradient-based optimization, i.e. the gradient of a least-squares model vs. data misfit function is sought, and used in conjunction with gradient descent methods, such as the conjugate gradient method or Newton's method to reduce the misfit. A classic paper which introduces control methods to glaciology is by Douglas MacAyeal (1992) .

Both examples show that the gradient of the model is a key ingredient.
In the following we shall be concerned with

• making clear why the adjoint (and not the tangent linear) model is what we want;
• how to get an adjoint for a complicated model in the form of a Fortran code using automatic differentiation (AD).

## A very simple example

The following simple example should help explain some very basic issues of adjoint models: \begin{align} \text{model} \quad L: \quad & \quad \mathbf{x} & \longmapsto & \quad \mathbf{y} \, = \, L (\mathbf{x}) \\ ~ & \left[ \begin{array}{c} x_1 \\ x_2 \\ \end{array} \right] & \longmapsto & \left[ \begin{array}{c} y_1 \\ y_2 \\ \end{array} \right] \, = \, \left[ \begin{array}{rr} 0 & a \\ -b & 0 \\ \end{array} \right] \cdot \left[ \begin{array}{c} x_1 \\ x_2 \\ \end{array} \right] \, = \, \left[ \begin{array}{r} a \, x_2 \\ -b \, x_1 \\ \end{array} \right] \\ ~ & ~ & ~ & ~ & ~ \\ \text{cost function} \quad J_0: \quad & \quad \mathbf{x} & \longmapsto & \quad J_0(\mathbf{y}) \, = \, J_0 (L(\mathbf{x})) \\ ~ & ~ & ~ & \quad J_0 (\mathbf{y}) = \frac{1}{\sigma^2_1} (y_1 - d_1)^2 \, + \, \frac{1}{\sigma^2_2} (y_2 - d_2)^2 \\ ~ & ~ & ~ & \quad J_0 (L(\mathbf{x})) = \frac{1}{\sigma^2_1} (a x_2 - d_1)^2 \, + \, \frac{1}{\sigma^2_2} (-b x_1 - d_2)^2 \\ \end{align}

### Some algebra

These notes expose how the gradient of $J_0$ can be computed via the tangent linear or the adjoint model. They discuss how things change when changing the control space from initial conditions $\mathbf{x} = [ x_1 \quad x_2 ]$ to model parameters $\mathbf{p} = [ a \quad b ]$, and that the notion of the adjoint model is ill-conceived.

### The corresponding Fortran code

Example codes, scripts to invoke the AD tool, and to compile are contained in this tar-file. Download, place in your home directory ~/ and un-tar via

# un-compress, un-tar in your home directory
cd ~
# check simple README
# cd to simple example code, using x_1, x_2 as controls
# cd to simple example code, using a, b as controls
cd ~/adjoint_example/simple_function/control_param/

## A more complex example: Kees' assignment of a mountain glacier model

Earlier in the school we developed a model for a mountain glacier (see Kees' assignment). Each of the six teams came up with a solution. From these I chose the Team 2 Solution (because it was posted on the WIKI, and because it's very compact) and used AD to generate a tangent linear (TLM) and adjoint (ADM) model.

I formulated the following control problem:

• dependent variable / cost: The "total volume", i.e. $fc = \sum H(i) \, \Delta x$ over all $i$
• independent variable / control: a perturbation in the mass balance $M$ at any point $i$

Here are codes for the slightly modified original code, the adjoint code, the tangent linear code, and the driver routine. The latter calculates the gradient of total volume with respect to changes in mass balance at each point using (1) the adjoint model, (2) the tangent linear model, and (3) via finite-difference perturbations. The f.d. calculation serves to test the results calculated via AD-generated code.

Remember that we have to run the TLM and f.d. model 31 times, corresponding to the number of grid point, i.e. the dimension of $M$, whereas we have to run the ADM only once (check this carefully in the driver program). Also, the f.d. model only uses the original forward code, it doesn't rely on any of the AD-generated codes.

Mountain glacier forward model (very slightly modified from the Team 2 Solution)

## Adjoint variables and Lagrange multipliers: some algebra

Adjoint methods are synonymous to what is better known in many fields as Lagrange multiplier methods. In a section on Lagrange multipliers and adjoints we spend some time to develop their relationship. A more complete treatment can be found, e.g. in Wunsch (2006)  or MacAyeal and Barcilon (1998) . The paper by Giering and Kaminski (1998)  offers a perspective based purely on application of the chain rule.

## A (incomplete) list of AD tools

There are quite a few AD tools around. We list a few of them which we think have significant/relevant reverse mode capabilities that are necessary to generate adjoint models. A more complete forum on automatic differentiation is the Community Portal for Automatic Differentiation autodiff.org. The authoritative textbook on AD is by Griewank and Walther (2008)

• TAF (Transformation of Algorithms in Fortran), developed by Fastopt, Hamburg, Germany
• OpenAD (Open-Source AD tool), developed at Argonne National Laboratory, Chicago, IL
• Tapenade, developed at INRIA, Sophia Antipolis, France