3D-Var Revisited and Quality Control of Surface Temperature Data Xiaolei Zou Department of...
-
Upload
blake-mathews -
Category
Documents
-
view
226 -
download
0
Transcript of 3D-Var Revisited and Quality Control of Surface Temperature Data Xiaolei Zou Department of...
3D-Var Revisit3D-Var Revisitededandand
Quality Control of Surface Temperature DataQuality Control of Surface Temperature Data
Xiaolei ZouXiaolei Zou
Department of MeteorologyDepartment of Meteorology
Florida State UniversityFlorida State University
[email protected]@met.fsu.edu
June 11, 2009June 11, 2009
OutlineOutline
• 3D-Var Formulation3D-Var Formulation• Statistical FormulationStatistical Formulation• AnalysisAnalysis• Practical ApplicationsPractical Applications
Part I:
• MotivationMotivation• EOF analysisEOF analysis
• QC for QC for TTss
Part II:
Part I Part I
3D-Var Revisited3D-Var Revisited
FactsFacts
1) All background fields, observation operators and observations have errors.
2) There is no truth. Errors in background, observation operator and observations can only be estimated approximately.
Produce the best analysis by combining all available information.
The GoalThe Goal
QuestionsQuestions
1) What is the measure of the best analysis?
2) How to combine all available information?
Variational FormulationVariational Formulation
J(x0 ) =12(x0 −xb)T B−1(x0 −xb) +
12(H(x0 )−yobs)T (O+ F)−1(H(x0 )−yobs)
A scalar cost function is defined:
x0 ← analysis of the atmospheric state
yobs ← observations
H ← observation operator
xb ← background
B ← background error covarnace matrix O ← observation error covarnace matrix F ← forward model error covarnace matrix
where
Statistical FormulationStatistical Formulation
€
pobs(y | yobs)
pb (x0 | xb )
€
H(x0)€
yobs
€
xb
Available information
pH (y | H (x ot ))
Write the PDFs for all three sources of information as:
€
pobs
€
pb€
pH
€
σ(x0,y) = pobs pb pH
Joint PDF:
PDF of the a posteriori state of information
The Bayes TheoremThe Bayes Theorem
The marginal PDF of the a posteriori state of information:
is the PDF of the a posteriori state of information in model space.
€
σ(x0) = σ (x0,y)dy∫ = pb (x0 | xb ) pobs(y | yobs)∫ pH (y | H(x0))dy
Application of Bayes TheoremApplication of Bayes Theoremto Data Assimilationto Data Assimilation
σ (x0 ),Data assimilation derives some features of the PDF, which is the a posteriori state of information in model space.
• The maximum likelihood estimate
~ analysis
• The covariance matrix of this estimate
~ analysis error covariance A
σ (x0a ) = max
x0
σ (x0 )
x0a
A = σ(x0∫ ) x0 −x0a( )
Tx0 −x0
a( )dx0
Assuming All Errors Are Gaussian,Assuming All Errors Are Gaussian,
The PDF for yobs:
pobs (y yobs ) =C1 exp −12
y−yobs( )TO−1 y−yobs( )
⎛⎝⎜
⎞⎠⎟
pb (x0 xb ) =C2 exp −12
x0 −xb( )TB−1 x0 −xb( )
⎛⎝⎜
⎞⎠⎟
pH (y H (x0 )) =C3 exp −12
y−H(x0 )( )TF−1 y−H(x0 )( )
⎛⎝⎜
⎞⎠⎟
The PDF xb:
The PDF for H(x0):
Bayes Estimate Under Bayes Estimate Under Gaussian AssumptionsGaussian Assumptions
pobs (y yobs ) =C1 exp −12
y−yobs( )TO−1 y−yobs( )
⎛⎝⎜
⎞⎠⎟
pb (x0 xb ) =C2 exp −12
x0 −xb( )TB−1 x0 −xb( )
⎛⎝⎜
⎞⎠⎟
pH (y H (x0 )) =C3 exp −12
y−H(x0 )( )TF−1 y−H(x0 )( )
⎛⎝⎜
⎞⎠⎟
σ (x0 ) = pb (x0 | xb ) pobs (y | yobs )∫ pH (y | H (x0 ))dy
σ (x0 ) = C exp −1
2x0 − xb( )
TB−1 x0 − xb( ) + H (x0 ) − yobs( )
TO + F( )
−1H (x0 ) − yobs( )( )
⎛⎝⎜
⎞⎠⎟
Maximum Likelihood EstimateMaximum Likelihood Estimate
σ (x0 ) = C exp −1
2x0 − xb( )
TB−1 x0 − xb( ) + H (x0 ) − yobs( )
TO + F( )
−1H (x0 ) − yobs( )( )
⎛⎝⎜
⎞⎠⎟
= C exp −1
2J(x0 )
⎛⎝⎜
⎞⎠⎟
Maximizing
€
σ(x0)
€
J(x0)Minimizing
€
⇔
The PDF of the a posteriori state of information in model space:
Statistical Estimate Variational Calculus
Gaussian and Non-Gaussian signalsGaussian and Non-Gaussian signals
The signals are sampled at 10000 points. PDFs are constructed at an interval of (ymax −ymin) /100.
y =rand(x)
y =4sinπx1000
⎛⎝⎜
⎞⎠⎟
Gaussian and Non-Gaussian signalsGaussian and Non-Gaussian signals
y =4sinπx1000
⎛⎝⎜
⎞⎠⎟+ rand(x)
y =0.4sinπx1000
⎛⎝⎜
⎞⎠⎟+ rand(x)
3D-Var & 3D-Var Analysis
The 3D-Var data assimilation solves a general inverse problem using the maximum likelihood estimate under the assumptions that all errors are Gaussian.
The 3D-Var analysis is the maximum likelihood estimate if all errors are Gaussian.
Zero Gradient: A necessary ConditionZero Gradient: A necessary Condition
J(x0 ) =12(x0 −xb)T B−1(x0 −xb) +
12(H(x0 )−yobs)T (O+ F)−1(H(x0 )−yobs)
∇J(x0 ) = B−1(x0 − xb ) + HT (O + F)−1(H (x0 ) − yobs )
B−1(x0 −xb) +HT (O+ F)−1(H(x0 )−yobs) =0
J(x0 + Δx0 )−J (x0 ) = ∇Jx0
( )TΔx0
∇J(x0 ) = 0
a linear operatora nonlinear operator
H {
Analytical Expression of Solution Analytical Expression of Solution with a Linear Modelwith a Linear Model
B−1(x0* −xb) +HT (O+ F)−1(H(x0
* )−yobs) =0
H is linear: H (x0 ) =Hx0
x0* −xb +BHT (O+ F)−1(Hx0
* −yobs) =0
x0* =xb + HTR−1H +B−1( )
−1HT O+ F( )−1 yobs −Hxb( )
Analytical Expression of Solution Analytical Expression of Solution with an Approximate Linear Modelwith an Approximate Linear Model
B−1(x0* −xb) +HT (O+ F)−1(H(x0
* )−yobs) =0
B−1 x0* −xb( ) +HT (O+ F)−1 H x0
* −x( )b− H(xb)−yobs( )( ) =0
H (x0* ) ≈H(xb) +H x0
* −xb( )
x0* =xb + HT O+ F( )−1 H +B−1
( )−1
HT O+ F( )−1 yobs −H xb( )( )
=xb +BHT HBHT +O+ F( )−1
yobs −H xb( )( )
Analysis ErrorAnalysis Error
When linear approximation is valid,When linear approximation is valid, the a posteriori PDF is approximately Gaussian, with the analysis as its mean and the following covariance matrix:
A = HTR−1H +B−1( )−1
=B−BHT HBHT +R( )−1
HB
3D-Var Analysis
A−1 =HT O+ F( )−1 H +B−1
A-1 is referred to as an information content matrix. When the analysis error is small, the value of ||A-1|| is large, the information content is large.
A = HTR−1H +B−1( )−1
A−1 ≥ B−1 , A−1 ≥ O+ F( )−1
The information content of the 3D-Var analysis is greater than the information content in either the background field or the observations that were assimilated.
3D-Var Practice
• Develop System Decision on variables and resolutions Estimate of background error covariance• Assimilate Data Decision on observations to be assimilated Understanding of the observations Estimate of observation errors Comparison between observations and background Development of the observation operator Estimate of model errors• Obtain Solution Minimization (preconditioning, scaling) Advanced computing (parallelization, data intensive computing platforms)
What does 3D-Var data assimilation involve?
J(x0 ) =12(x0 −xb)T B−1(x0 −xb) +
12(H(x0 )−yobs)T (O+ F)−1(H(x0 )−yobs)
F
xb yobsx0
H
B O
Choice of analysis variable
What data to assimilation?
Which model to use?
What background to start with?
How to estimate elements in B?
Where to find their values?
How to quantify it?
Model Space Observed Space
x0
3D-Var analysis
+
What need to be done before and after conducting 3D-Var
experiments?
3D-Var3D-VarInput Data Output Analysis
Quality Control Diagnosis of Analysis
What need to be done before and after conducting 3D-Var experiments?
• Quality Control Knowing the data Knowing the major difference between data and background field Remove errorneous data Eliminate data that render errors non-Gaussian • Diagnosis of 3D-Var analyses Check the convergence Examine the analysis increments Estimate analysis errors Assess forecast impact Provide physical and dynamical explanations to the numerical results one obtains
When Working with Real-Data,When Working with Real-Data,
The key things areThe key things are
• Knowing the dataKnowing the data before inputting them before inputting them
into a 3D-Var system by a careful QC!into a 3D-Var system by a careful QC!• Kowing the systemKowing the system after a 3D-Var after a 3D-Var
experiment by a careful analysis of experiment by a careful analysis of
the 3D-Var results!the 3D-Var results!
Examing 3D-Var ResultsExaming 3D-Var Results
Analysis - obsAnalysis - obsone-week average one-week average
resultsresults
q p
u v
Differences between Differences between model and obs.model and obs.beforebefore and and afterafter a a 3D-Var experiment3D-Var experiment
pb - pobs and pa-pobs
29
σ ob2
σ a2
σ o2
σ b2
σ oa2
Inferred from
calculated
σ b2 = σ ob
2 − σ o2 σ a
2 = σ oa2 − σ o
2and
Part IIPart II
Quality Control of Surface Quality Control of Surface Temperature DataTemperature Data
31
Motivations
• Surface data are abundant • Very little surface data are assimilated in operational systems• Surface data are important to thunderstorm prediction
Challenges
• Existing data assimilation systems have short or no memory of surface data• Diurnal cycle dominants the variability of surface variability and is not described with sufficient accuracy in large-scale analysis which is used as background in mesoscale forecast• Background errors are non-Gaussian
32
A Total of 3197 Surface Stations
The number of missing data at each station in January 2008 is indicated by color bar.
33
Improving Surface Data Assimilation
Key steps:
1) Inclusion of more surface data
2) Improved QC
3) Vertical interpolation based on the atmospheric
structures within the boundary layer
Surface layer
Mixed layer
3) Incorporation of dynamic constraint
34
EOF Modes for Ts Constructed from Station Observations
First Second Third
Fourth Fifth Sixth
35
EOF Modes for Ts Constructed from Station Observations (cont.)
Seventh Eighth
Ninth Tenth
36
Explained Variances
Surface Data (blue)NCEP analysis (red)
37
10
14
18
-10
0
10
-10
-5
0
5
-4
0
4
-4
0
4
-4
0
4
Fi rst
Second
Thi rd
Fourth
Fi f th
Si xth
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Ti me (uni t: day)
Principal Components (PCs)
38
Principal Components (PCs)
-3
0
3
-3
0
3
-2
0
2
-2
0
2
Seventh
Ei ghth
Ni nth
Tenth
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Ti me (uni t: day)
39
Dominant Oscillations in January 2008P
erio
d (
un
it:
day
)
Obs.
NCEP
EOF mode EOF mode EOF mode
Per
iod
(u
nit
: h
our)
Per
iod
(u
nit
: h
our)
Longer-periodoscillation
Diurnal oscillation Shorter-periodoscillation
40
Diurnal Oscillation
-10
0
10
-4
0
4
-4
0
4
-6
0
6
-6
0
6
-6
0
6
-6
0
6
Tenth
Ni nth
Si xth
Fi f th
Fourth
Thi rd
1 2 3 4 5 6 7Ti me(uni t: day)
Second
41
Longer-Period Oscillations
-12
0
12
-12
0
12
-12
0
12
-12
0
12
-12
0
12
Si xth
Seventh
Ei ghth
Ni nth
Tenth
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Ti me (uni t: day)
42
Diurnal Oscillationand
Longer-Period Oscillations
Phase difference
Amplitude difference
43
PC Differences between Surface Data and NCEP Analysis
Second Third
Fourth Fifth
Sixth
Time (unit: day)
Time (unit: day)
Blue line: First WeekRed line: Last Week
44
Frequency Distributions of Diurnal Cycle Modes
First Week
Third
FourthFourth
Fifth
Fifth
SixthFre
qu
ency Second
Tobs-TNCEP (unit: K)F
req
uen
cy
Fre
qu
ency
Fre
qu
ency
Fre
qu
ency
Last Week First Week Last Week
Fourth
Sixth
Tobs-TNCEP (unit: K)
January 2008
Second
Fourth
Sixth
Third
Fifth
45
Frequency Distributions (modes 2-6)
First Week Last Week
Entire Month
Fre
qu
ency
Fre
qu
ency
Tobs-TNCEP (unit: K)
Tobs-TNCEP (unit: K)Tobs-TNCEP (unit: K)
Fre
qu
ency
Sum of Modes 2-6
46
Statistical Measures
Mean Variance
Kurtosis Skewness
47
QC Procedure
Step 1:
1) Historical extremum checkΔT > The average of NCEP analysis of each station pluses (minuses) 15-times its variance
• Temporal consistency check ΔT > 50℃ in 24-hours interval1) Bi-weight check
Z-score > 3• Spatial consistency checkT > The average of linear fit to highly correlated
stations pluses (minuses) 4-times its variance
48
QC Procedure (cont.)
Step 2:
The Z-score of the difference between station observation and background field must less than 4
Step 3:
The Z-score of the difference between station observation and background field excluding the contribution from diurnal cycle must less than 2
49
( a )( a )
Step 2
( b )( b )
Step 3
Background
Obs
.
Obs
.
Background
50
Frequency Distribution before and after QC
First Week Last Week
Entire Month
Tobs-TNCEP (unit: K)
Tobs-TNCEP (unit: K)Tobs-TNCEP (unit: K)
Fre
qu
ency
Fre
qu
ency
Fre
qu
ency
51
Frequency Distribution with and without Contribution from Modes 2-6
First Week Last Week
Entire Month
Tobs-TNCEP (unit: K)Tobs-TNCEP (unit: K)
Tobs-TNCEP (unit: K)
Fre
qu
ency
Fre
qu
ency
Fre
qu
ency
52
Correlations and RMS Differences of the PCs before and after QC
53
Data Number Removed at Each Station
Step One Step Two
Step Three All Three QC Steps
54
Percentage of Data Removed by QC
Step 1 Steps 1-2 Steps1- 3
55
Percentage of Data Removed by QC
Time (day)
56
Variation of the Statistical Measureswith QC Steps
Mea
n (
un
it:
K)
Std
. (u
nit
: K
)
Sk
ewn
ess
Ku
rtos
is
Step 1 Step 2 Step 3 Step 1 Step 2 Step 3Ori. No DC Ori. No DC
57
Time Evolution of Standard Deviationbefore and after QC
Std
. (K
)
Time (unit: day)
58
Time-Zone Dependence of Diurnal Oscillation
Time (unit: hour)
Tem
per
atu
reT
emp
erat
ure
Weekly mean Ts at seven surface stations selected within different time zones: Zone 1: 55.03E, 36.42N Zone 2: 65.68E, 40.55N Zone 3: 82.78E, 41.23N Zone 4: 98.9E, 40.0N Zone 5: 110.05, 41.03 Zone 6: 128.15E, 40.89N Zone 7: 141.17E, 39.7N
Surface Obs.
NCEP Analysis
59
Average Time at Which Ts Reached the Maximum in the First Week of January 2008
Tim
e (U
TC
) T
ime
(UT
C)
60
Global Diurnal Cycle
NCEP ECMWF (ERA-Interim)
Surface ObservationsTime (UTC)
Time (UTC)
Tem
pera
ture
(K
)
Tem
pera
ture
(K
)
Time (UTC)
January 1-7, 2008
SummarySummary
• Diurnal cycle dominates the temporal variability of
surface data
• Large-scale analysis contains a significant phase error
(~10-85 degrees) of the diurnal cycle
• A three-step QC procedure is developed to identify
outliers in surface-station temperature data which have
a non-Gaussian frequency distribution
More details can be found inMore details can be found in
Qin, Z.-K., X. Zou, G. Li and X.-L. Ma, 2009: Quality control of surface temperature data with non-Gaussian background errors. Quart. J. Roy. Meteor. Soc., Submitted.
Zou, X. and Qin, Z.-K., 2009: Diurnal cycle in global analysis. J. Geo. Letter., to be submitted.