# Lagrange multipliers and adjoints

## Contents |

## Overview

*The following text has been taken from Section 2 of Heimbach and Bugnion (2009)
^{[1]}*

Conventionally, sensitivities are assessed by perturbing a control variable of interest and investigating the ice sheet's response to the applied perturbation. For each quantity a separate run needs to be performed, and for quantities that vary spatially, assumptions need to be made as to where to perturb. At one extreme, perturbations are spatially uniform (e.g. uniform air temperature perturbation everywhere), at the other extreme, a perturbation at each grid point separately, and a forward simulation for each such perturbation are performed in order to produce a full sensitivity map. For a model configuration with grid points, the initial value control problem alone spans a 918,400-dimensional control space.

Alternatively, the adjoint model is able to provide a complete set, or map, of sensitivities in one single integration. For all its appeal, obtaining an adjoint model is not an easy task, Encouraged by comments on a first version of this text, we attempt, in the following, to provide a short self-contained introduction to adjoint modeling and automatic differentiation as no such description exists yet in the glaciological literature. We establish the connection between the tangent linear model (TLM), the adjoint model (ADM) and Lagrange multipliers. We then show how to use automatic differentiation (AD) to generate tangent linear and adjoint model code.

### The adjoint or Lagrange multiplier method

Our goal is to find sensitivities, i.e. partial derivatives of a
scalar-valued objective or cost function
with respect to control variables .
The depedency of on is usually indirect, i.e.
comes in through the dependency of
elements of state variables of a model on .
For simplicity we focus on the case where is the model's
initial state .
Following the notation of
^{[2]}
the time-dependent model has the general form

and is integrated from time to . (Note that to simplify notation, we can formally extend the model state space to include both model prognostic variables as well as model parameters and boundary conditions.) To take an example, let the objective function consist of the time-mean volume over the last time steps , of the ice sheet, expressed as the spatial, area-weighted sum over the thickness field at time which is an element of the model prognostic state space . Thus,

The Lagrange multiplier method consists of rewriting the problem of finding derivatives of , subject to the constraint of fulfilling eqn. (\ref{eqn:model}) into an extended, and unconstrained problem:

For each element of the model state at time we have introduced a corresponding Lagrange multiplier .

The set of normal equations are obtained by requiring the partial derivatives
of (\ref{eqn:extendedcost}) with respect to each variable
for times to vanish
independently (see e.g.
^{[2]}
)

Eqn. (\ref{eqn:normaleq1}) simply recovers the model equations. The Lagrange multipliers are found through successive evaluation of the normal equations backward in time. Starting at we find, via eqn. (\ref{eqn:normaleq3}), and using example cost (\ref{eqn:costvol})

time steps earlier, at , and using the results of , we obtain, using eqn. (\ref{eqn:normaleq2}):

Finally, eqn. (\ref{eqn:normaleq4}) provides the expression for the full gradient sought at time .

The interpretation is as follows: the Lagrange multiplier provides the complete sensitivity of at time by accumulating all partial derivatives of with respect to from each time step . Those partials taken at later times , are propagated to time via the adjoint model (ADM), which is the transpose of the model Jacobian or tangent linear model (TLM), , and contributions from different times linearly superimposed. Further simplifying the example objective function (\ref{eqn:costvol}) to the special case where instead of the time-mean, only the volume at the last time step is chosen, i.e. , eqn. (\ref{eqn:lagrangetime}) simplifies in that all terms except the one containing vanish.

### The tangent linear and adjoint models

All relevant aspects of the LMM have now been derived, but we wish to make plain the duality between the tangent linear model (TLM) and the adjoint model (ADM), and re-state our problem in a slightly different, but equivalent framework which provides a natural basis for introducing the concept of automatic differentiation. The nonlinear model (NLM) of eqn. (\ref{eqn:model}) may be viewed as a mapping of the state space (we again incorporate all parameters, initial and boundary conditions into an extended state space) from time to . Then, the cost function , eqn. (\ref{eqn:costvol}) is a composite mapping from the state space at to , and from there to the real numbers. To simplify notation, let , and consider the special case , i.e. . Then,

The composite nature of the mapping of is readily apparent:

A perturbation of the initial state is linked to a perturbation of by applying the chain rule to (\ref{eqn:composition}):

Recognizing that is just the scalar product of and , we can take advantage of the formal definition of an adjoint operator to rewrite this equation:

or, in short, combining the tangent linear and adjoint matrices into and , respectively:

Eqn. (\ref{eqn:lagrangetime}), (\ref{eqn:scalarprod}), and (\ref{eqn:shortscalar}) expose various features:

- the equivalence between Lagrange multipliers and the adjoint operator;
- the TLM runs forward in time, propagating the effect of a perturbation to all model outputs, while the ADM runs backward, accumulating sensitivities of to all model inputs;
- the advantage of the ADM which provides the full gradient of a model-constrained objective function over the TLM which only provides the projection of onto the perturbed vector from initial perturbation .

## Derivative code generation via automatic differentiation

In general the complexity and the effort involved in the development
of an adjoint model matches that of its parent nonlinear model development
and frequently prohibits adjoint model applications.
An alternative to hand-coding the adjoint
(i.e. coding the discretized adjoint equations)
and major step forward is the use of
automatic (or algorithmic) differentiation (AD)
to derivative (e.g. ADM or TLM) code generation \citep{grie:00}.
The advent of AD source-to-source transformaiton tools
such as the commercial tool
*Transformation of Algorithms in Fortran* (TAF)
\citep{gier-etal:05} or the open-source tool
OpenAD \citep{utke-etal:08} has enabled
the development of exact adjoint models from complex, nonlinear forward models.
In the oceanographic context, there is now a decade of experience in
applying this method to a state-of-the-art ocean general circulation model
(e.g. \cite{maro-etal:99,gala-etal:02,heim-etal:02,stam-etal:03,heim:08}),
which encourages us to
apply such tools to state-of-the-art ice sheet models.

The composite nature of the mapping is that of a large number of elementary operations; eqn. (\ref{eqn:mapping}) reflects this at the time-stepping level. Carrying the concept of composition to its extreme, ultimately each line of code can be viewed as an elementary operation. At the elementary level, the AD tool knows the complete set of derivatives (i.e. the elementary Jacobians) for each intrinsic arithmetic function (+, -, *, /, , , etc.), as well as logical/conditional instructions.

## References

- ↑ Heimbach, P. and V. Bugnion, 2009: Greenland ice sheet volume sensitivity to basal, surface, and initial conditions, derived from an adjoint model.Annals of Glaciology, 50(52), 67-80. [1]
- ↑
^{2.0}^{2.1}Wunsch, C., 2006: Discrete inverse and state estimation problems: with geophysical fluid applications. Cambridge University Press.