DoD CG/AR at Colorado State University AT737 November 19, 2008 1 What is Data Assimilation? Dr....

DoD CG/AR at Colorado State University AT737 November 19, 2008 1

What is Data Assimilation?

Dr. Andrew S. JonesCSU/CIRA


Data Assimilation

Outline Why Do Data Assimilation? Who and What Important Concepts Definitions Brief History Common System Issues / Challenges


The Purpose of Data Assimilation

Why do data assimilation?1. I want better model initial conditions for better model forecasts2. I want better calibration and validation (cal/val)3. I want better acquisition guidance


Why do data assimilation?1. I want better model initial conditions for better model forecasts2. I want better calibration and validation (cal/val)3. I want better acquisition guidance4. I want better scientific understanding of

Model errors (and their probability distributions) Data errors (and their probability distributions) Combined Model/Data correlations DA methodologies (minimization, computational optimizations,

representation methods, various method approximations) Physical process interactions (i.e., sensitivities and feedbacks)

Leads toward better future modelsVIRTUOUS CYCLE

The Purpose of Data Assimilation


The Data Assimilation Community

What skills are needed by each involved group? NWP Data Assimilation Experts (DA system methodology) NWP Modelers (Model + Physics + DA system) Application and Observation Specialists (Instrument capabilities) Physical Scientists (Instrument + Physics + DA system) Radiative Transfer Specialists (Instrument config. specifications) Applied Mathematicians (Control theory methodology) Computer Scientists (DA system + OPS time requirements) Science Program Management (Everything + $$ + Good People) Forecasters (Everything + OPS time reqs. + Easy/fast access) Users and Customers (Could be a wide variety of responses)

e.g., NWS / Army / USAF / Navy / NASA / NSF / DOE / ECMWF


The Data Assimilation Community

Are you part of this community? Yes, you just may not know it yet.


Bayes Theorem

Maximum Conditional Probability is given by:

P (x | y) ~ P (y | x) P (x)

Assuming Gaussian distributions…

P (y | x) ~ exp {-1/2 [y – H (x)]T R-1 [y – H (x)]}P (x) ~ exp {-1/2 [x –xb]T B-1 [x – xb]}

e.g.,3DVAR

Lorenc (1986)


Minimization Process

TRUTH

Jacobian of the Cost Function is used in the minimization procedure

Minima is at J/ x = 0

Issues:Is it a global minima?Are we converging rapid

or slow?

J

x


The Building Blocks of Data Assimilation

NWP Model

Observations

NWPAdjoint

Minimization

Observation ModelAdjoint

Control Variablesare the initial modelstate variablesthat are optimizedusing the new datainformation as a guide

They can also includeboundary conditioninformation, modelparameters for“tuning”, etc.

Observation Model

Start


What Are We Minimizing?

)x(xB)x(x21y))](x(h[MRy))](x(h[M

21 b0

1Tb00i0,

1T0i0,

(time) i

J Minimize discrepancy between model and observation data over time

The Cost Function, J, is the link between the observational data and the model variables

Observations are either assumed unbiased, or are “debiased” by some adjustment method


How are Data used in Time?

Assimilation time window

observations

)(ε)(xx

ε)](x[y

x01-i0,i

y0i0,

GM

Mh

Observation model

Cloud resolving model

time

forecast



observationstime

forecast

A “Smoother” Uses All Data Availablein the Assimilation Window

(a “Simultaneous” Solution)

)(ε)(xx

ε)](x[y

x01-i0,i

y0i0,

GM

Mh

Observation model




observationstime

forecast

A “Filter” Sequentially Assimilates Dataas it Becomes Available in each Cycle

)(ε)(xx

ε)](x[y

x01-i0,i

y0i0,

GM

Mh

Observation model




observationstime

forecastCycle PreviousInformation

)(ε)(xx

ε)](x[y

x01-i0,i

y0i0,

GM

Mh

Observation model




Cycle Physics “Barriers”What Can Overcome the Barrier?1. Linear Physics Processes and2. Propagated Forecast Error Covariances

time

forecast

)(ε)(xx

ε)](x[y

x01-i0,i

y0i0,

GM

Mh

Observation model




Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time

Candidate 1: Background Term“x0” is the model state vector at the initial time

t0

this is also the “control variable”,the object of the minimization process

“xb” is the model background state vector“B” is the background error covariance

of the forecast and model errors


21 b0

1Tb00i0,

1T0i0,

(time) i

J


What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time

Candidate 1: Background TermThe default condition for the assimilation when1. data are not available or2. the available data have no significant sensitivity

to the model state or3. the available data are inaccurate


21 b0

1Tb00i0,

1T0i0,

(time) i

J


Model Error Impacts our “Trust” Minimize discrepancy between model and observation data over time

Candidate 1: Background TermModel error issues are importantModel error varies as a function of the model timeModel error “grows” with timeTherefore the background term should be trusted

more at the initial stages of the model run and trusted less at the end of the model run


21 b0

1Tb00i0,

1T0i0,

(time) i

J



21 b0

1Tb00i0,

1T0i0,

(time) i

J

How to Adjust for Model Error? Minimize discrepancy between model and observation data over time

Candidate 1: Background Term1. Add a model error term to the cost function so that the

weight at that specific model step is appropriately weighted or

2. Use other possible adjustments in the methodology, i.e., “make an assumption” about the model error impacts

If model error adjustments or controls are used the DA system is said to be “weakly constrained”


What About Model Error Errors? Minimize discrepancy between model and observation data over time

Candidate 1: Background TermModel error adjustments to the weighting can be “wrong”

In particular, most assume some type of linearity Non-linear physical processes may break these

assumptions and be more complexly interrelatedA data assimilation system with no model error control is said to

be “strongly constrained” (perfect model assumption)


21 b0

1Tb00i0,

1T0i0,

(time) i

J


What About Model Error Errors?

A StronglyConstrainedSystem?

“I just can’t run like I used to.”

Model“Little Data People”


What About Model Error Errors?

A StronglyConstrainedSystem?

Can Data Over Constrain?

“We’ll… no one’s perfect.”

DA expert



21 b0

1Tb00i0,

1T0i0,

(time) i

J

Candidate 2: Observational Term“y” is the observational vector, e.g., the satellite input data

(typically radiances), salinity, sounding profiles“M0,i(x0)” is the model state at the observation time “i”“h” is the observational operator, for example the

“forward radiative transfer model”“R” is the observational error covariance matrix that specifies

the instrumental noise and data representation errors (currently assumed to be diagonal…)

Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time


What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time

Candidate 2: Observational TermThe non-default condition for the assimilation when

1. data are available and2. data are sensitive to the model state and3. data are precise (not necessarily “accurate”) and4. data are not thrown away by DA “quality control”

methods


21 b0

1Tb00i0,

1T0i0,

(time) i

J


What About other DA Errors?

Overlooked Issues?1. Data debiasing relative to the DA system

“reference”. It is not the “Truth”,however it is self-consistent.

2. DA Methodology Errors?1. Assumptions: Linearization, Gaussianity, Model

errors2. Representation errors (space and time)3. Poorly known background error covariances4. Imperfect observational operators5. Overly aggressive data “quality control”6. Historical emphasis on dynamical impact vs. physical

Synoptic vs. Mesoscale?


DA Theory is Still MaturingThe Future: Lognormal DA (Fletcher and Zupanski, 2006, 2007)

Gaussian systems typically force lognormal variables to become Gaussian introducing an avoidable data assimilation system bias

Many important variablesare lognormally distributed

Gaussian data assimilation systemvariables are “Gaussian”

Add DA

Bias Here!

LognormalVariablesCloudsPrecipitationWater vaporEmissivities

Many otherhydrologicfields

Mode Mean


What “Truth” Do We Have? Minimize discrepancy between model and observation data over time

DATA MODELCENTRIC CENTRIC

TRUTH


21 b0

1Tb00i0,

1T0i0,

(time) i

J


What Approach Should We Use?


TRUTH



DATA MODEL

CENTRIC CENTRIC

TRUTH


My My PrecioPrecious…us…

We Trust the We Trust the Model!Model!

Data hurts Data hurts us!,us!,

Yes…Yes…


DATA MODEL

CENTRIC CENTRIC

TRUTH


MODEL CENTRIC FOCUS


FOCUS ON“B” Background Error

Improvements are Needed“xb” Associated background

states and “Cycling” are more heavily emphasized

DA method selection tends toward sequential estimators, “filters”, and improved representation of the forecast model error covariances

E.g., Ensemble Kalman Filters,other Ensemble Filter systems




TRUTH

This is not to say thatall model-centric improvementsare bad…



DATA

MODEL

CENTRIC

CENTRIC

TRUTH


My Precious…My Precious…We Trust the Data!We Trust the Data!Models unfair and Models unfair and

hurts us!,hurts us!,Yes…Yes…


DATA

MODEL

CENTRIC

CENTRIC

TRUTH


DATA CENTRIC FOCUS

DATA DATACENTRIC CENTRIC

FOCUS ON“h” Observational Operator

Improvements are Needed“M0,i(x0)” Model state capabilities and

independent experimental validation is more heavily emphasized

DA method selection tends toward “smoothers” (less focus on model cycling), more emphasis on data quantity and improvements in the data operator and understanding of data representation errors e.g., 4DVAR systems


DUAL-CENTRIC FOCUS

Best of both worlds?Solution: Ensemble based forecast covariance

estimates combined with 4DVAR smoother for research and 4DVAR filter for operations?

Several frameworks to combine the two approaches are in various stages of development now…


What Have We Learned?


TRUTH

Your Research Objective is CRITICAL to making the right choices…

1. Operational choices may supercede good research objectives2. Computational speed is always critical for operational purposes3. Accuracy is critical for research purposes


DA Theory is Still Maturing

A Brief History of DA1. Hand Interpolation2. Local polynomial interpolation schemes

(e.g., Cressman)3. Use of “first guess”, i.e., a background4. Use of an “analysis cycle” to regenerate

a new first guess5. Empirical schemes, e.g., nudging6. Least squares methods

1. Variational DA (VAR)2. Sequential DA (KF)3. Monte Carlo Approx. to Seq. DA (EnsKF)


Optimal Interpolation (OI)

)]([a bb h xyWxx

OI merely means finding the “optimal” Weights, W

Eliassen (1954), Bengtsson et al. (1981), Gandin (1963)Became the operational scheme in early 1980s and early 1990s

1)( RHBHBHW TT

A better name would have been “statistical interpolation”


Variational Techniques

Finds the maximum likelihood (if Gaussian, etc.)(actually it is a minimum variance method)

Comes from setting the gradient of the cost function equal to zero

Control variable is xa

Major Flavors: 1DVAR (Z), 3DVAR (X,Y,Z), 4DVAR (X,Y,Z,T)Lorenc (1986) and others…

Became the operational scheme in early 1990s to the present day

xH

xyRHHRHBxx

hhTT

ba )]([)( 111


What is a Hessian?

A Rank-2 Square MatrixContaining the Partial Derivatives of the Jacobian

G(f)ij(x) = DiDj f(x)

The Hessian is used in some minimization methods,e.g., quasi-Newton…

)()( 11 nnnn ff xxGxx


The Role of the Adjoint, etc.Adjoints are used in the cost function minimization procedure

But first…

Tangent Linear Models are used to approximate the non-linear model behaviors

L x’ = [M(x1) – M(x2)] / L is the linear operator of the perturbation modelM is the non-linear forward model

is the perturbation scaling-factorx2 = x1 + x’


Useful Properties of the Adjoint

<Lx’, Lx’> <LTLx’, x’> LT is the adjoint operator of the perturbation model

Typically the adjoint and the tangent linear operator can be automatically created using automated compilers

y = (x1, …, xn, y)

*xi = *xi + *y /xi

*y = *y /y where *xi and *y are

the “adjoint” variables


Useful Properties of the Adjoint

<Lx’, Lx’> <LTLx’, x’> LT is the adjoint operator of the perturbation model

Typically the adjoint and the tangent linear operator can be automatically created using automated compilers

Of course, automated methods fail for complex variable types

(See Jones et al., 2004)

E.g., how can the compiler know when the variable is complex, when codes are decomposed into real and imaginary parts as common practice? (It can’t.)


Sequential Techniques

B is no longer static, B => Pf = forecast error covariancePa (ti) is estimated at future times using the model

K = “Kalman Gain” (in blue boxes)

Extended KF, Pa is found by linearizing the model about the nonlinear trajectory of the model between ti-1 and ti

Kalman (1960) and many others…These techniques can evolve the forecast error covariance fields

similar in concept to OI

)()}])([)({()(

)]}([}{])([)({)()(1

1

if

iTii

fii

Tii

fi

a

ifT

iif

iiTii

fi

fi

a

tttt

tHtttt

PHHPHRHPIP

xyHPHRHPxx



f is a particular forecast instancel is the reference state forecastPf is estimated at future times using the modelK number model runs are required(Q: How to populate the seed perturbations?)

Sampling allows for use of approximate solutions Eliminates the need to linearize the model (as in Extended KF) No tangent linear or adjoint models are needed

Ensembles can be used in KF-based sequential DA systemsEnsembles are used to estimate Pf through Gaussian “sampling” theory

K

lk

Tfl

fk

fl

fk

fl K

)()(2

1 xxxxP



Notes on EnsKF-based sequential DA systems1. EnsKFs are an approximation2. Underlying theory is the KF (circa 1960)3. Assumes Gaussian distributions4. Many ensemble samples are required5. Can significantly improve Pf

6. Where does H fit in? Is it fully “resolved”?7. What about the “Filter” aspects?Future Directions Research using Hybrid EnsKF-Var techniques



NE is the number of ensemblesS is the state-space dimensionEach ensemble is carefully selected to represent the

degrees of freedom of the systemSquare-root filter is built-in to the algorithm assumptions

Zupanski (2005): Maximum Likelihood Ensemble Filter (MLEF)Structure function version of Ensemble-based DA(Note: Does not use sampling theory, and ismore similar to a variational DA scheme usingprinciple component analysis (PCA)

][ 2121 f

Nff

f EpppP

fiS

fi

fi

fi

p

pp

,

,2

,1

p


Where is “M” in all of this?

3DDA Techniques have no explicit model time tendency information, it is all done implicitly with cycling techniques, typically focusing only on the Pf term

4DDA uses M explicitly via the model sensitivities, L, and model adjoints, LT,as a function of time

Kalman Smoothers (e.g., also 4DEnsKS) would likewise also need to estimate L and LT

No Mused

M used


4DVAR Revisited(for an example see Poster NPOESS P1.16 by Jones et al.)

LT is the adjoint which is integrated from ti to t0

Adjoints are NOT the “model running in reverse”,but merely the model sensitivities being integratedin reverse order, thus all adjoints appear to functionbackwards. Think of it as accumulating the“impacts” back toward the initial control variables.

Automatically propagates the Pf within the cycle, however can not save the result for the next analysis cycle (memory of “B” info becomes lost in the next cycle) (Thepaut et al., 1993)

)(

})]([{

00

0,01

0

0,0,01

0

t

M

h

MhJ

ii

ii

iiiTi

N

i

Tib

xxx

L

xH

yxRHLxxBx


obs1

obs2

xtime

Geographically distant observations can bring more information than close-by observations, if in a dynamically significant region

grid-point

Ensembles: Flow-dependent forecast error covariance and spread of information from observations

t0

t1

t2

Isotropic

correlations

From M. Zupanski


J=const.

min

x0

xmin

J=const.

Physical space

Preconditioning space

-g

-gx

Preconditioning the Space

“Preconditioners” transform thevariable space so that fewer iterationsare required while minimizing the cost function

x ->

Result: faster convergence

From M. Zupanski


Incremental VARCourtier et al. (1994)

Most Common 4D framework in operational use

Incremental form performs Linear minimization within a lower dimensional space (the inner loop minimization)

Outer loop minimization is at the full model resolution(non-linear physics are added back in this stage)

Benefits:Smoothes the cost function and assures better

minimization behaviorsReduces the need for explicit preconditioning

Issues: Extra linearizations occur It is an approximate form of VAR DA


Types of DA Solution Spaces1. Model Space (x)2. Physical Space (y)3. Ensemble Sub-space

e.g., Maximum Likelihood Ensemble Filter (MLEF)

Types of Ensemble Kalman Filters1. Perturbed observations (or stochastic)2. Square root filters (i.e., analysis perturbations are

obtained from the Square root of the Kalman Filter analysis covariance)


Data AssimilationConclusions Broad, Dynamic,

Evolving, Foundational Science Field!

Flexible unified frameworks, standards, and funding will improve training and education

Continued need for advanced DA systemsfor research purposes(non-OPS)

Can share OPS framework components,e.g., JCSDA CRTM

Data Assimilation

Thanks! ([email protected])

For more information…Great NWP DA Review Paper (By Mike Navon)ECMWF DA training materialsJCSDA DA workshophttp://people.scs.fsu.edu/~navon/pubs/JCP1229.pdfhttp://www.ecmwf.int/newsevents/training/rcourse_notes/http://www.weatherchaos.umd.edu/workshop/

DoD CG/AR at Colorado State University AT737 November 19, 2008 1 What is Data Assimilation? Dr....

Documents

Transcript of DoD CG/AR at Colorado State University AT737 November 19, 2008 1 What is Data Assimilation? Dr....