Statistics in WR: Session 20

55
Statistics in WR: Session 20 Introduction to Spatial Statistics Ernest To

description

Statistics in WR: Session 20. Introduction to Spatial Statistics Ernest To. Outline. Basics of spatial statistics Kriging Application of spatial-temporal statistics (Gravity currents in CCBay). Basics. Consider the following scenario. - PowerPoint PPT Presentation

Transcript of Statistics in WR: Session 20

Page 1: Statistics in WR: Session 20

Statistics in WR: Session 20

Introduction to Spatial Statistics

Ernest To

Page 2: Statistics in WR: Session 20

Outline

1. Basics of spatial statistics

2. Kriging

3. Application of spatial-temporal statistics (Gravity currents in CCBay)

Ernest To 20090408

2

Page 3: Statistics in WR: Session 20

Basics

Page 4: Statistics in WR: Session 20

Consider the following scenario• Two river stations, A and B,

measure dissolved oxygen (DO). • At station A

– mean DO = µA = 5 mg/L

– std dev at Station A= σA = 2 mg/L

• At station B– mean DO = µB = 5 mg/L

– std dev at Station A= σB = 2 mg/L

• Correlation between measurements at stations A and B = ρAB = 0.5.

AA

BB

Ernest To 20090408

4

Page 5: Statistics in WR: Session 20

New data!• We collected a DO measurement

of 2 mg/L at Station A.

• What is the updated mean (µB|XA ) and standard deviation (σB|XA) at Station B?

– (assume that the DO distributions are normal)

AA

BB

µA = 5 mg/LσA = 2 mg/L

New sample X A = 2 mg/L

µB = 5 mg/LσB = 2 mg/L

µB|XA = ? σB|XA = ?

Ernest To 20090408

5

Page 6: Statistics in WR: Session 20

• Distributions at A and B (assume normal)

• Joint distribution at A and B

Let’s sketch out the distributions

XA

XB

µA = 5 mg/L, σA = 2 mg/L µB = 5 mg/L, σB = 2 mg/L

f(xA,xB)

XBXA

f(xA) f(xB)

Ernest To 20090408

6

Page 7: Statistics in WR: Session 20

Marginal and joint distributions

XA

XB

f(xA,xB)

Ernest To 20090408

7

Page 8: Statistics in WR: Session 20

How does ρAB affect the shape of the joint distribution?

Scatter plots of XA vs XB

Joint distribution of XB and XA

ρAB = 0.5ρAB = 0.5 ρAB = 0.99ρAB = 0.99ρAB = 0ρAB = 0ρAB = -0.99ρAB = -0.99

XA

XB

XB

XA

f(xA,xB)

XA

XB

XA

XB

XA

XB

XB

XA

XB

XA

XB

XA

Ernest To 20090408

8

Page 9: Statistics in WR: Session 20

Bayesian conditioning

Prior pdf

XA

xA = 2 mg/L

XB

XA

xA = 2 mg/L

XB

Prior pdf (joint distribution)

XA

XB

PRIOR STAGE

CONDITIONALIZATION STAGEObserved data is used to update

the distribution.

POSTERIOR STAGE

A conditional pdf for XB is generated.

Conditional pdf

),( BAXX xxfBA

)2(

),2( |

AX

BAXXXX xf

xxff

A

BA

AB

LmgxA

/2

Ernest To 20090408

9

Page 10: Statistics in WR: Session 20

Prior pdf

XA

xA = 2 mg/L

XB

Conditional pdf

(The variance is independent of XA or XB Homoscedasticity)

If the prior pdf is binormal, the conditional pdf is also normal with:

Mean =

Variance =

Conditional pdf

A

A

B

BAB XAX

XXXX X

|

22|

2 1 AAB XXX

Expected value of conditional pdf is a linear function of the conditioning data

XB|XA

AB XXf |

Ernest To 20090408

10

Page 11: Statistics in WR: Session 20

Back to the problemUpdated mean and std. dev at Station BMean

Std. dev

AA

BB

µA = 5 mg/LσA = 2 mg/L

New sample X A = 2 mg/L

µB = 5 mg/LσB = 2 mg/L

µB|XA = 3.5 mg/L σB|XA = 1.7 mg/L

Lmg

LmgLmgLmg

LmgLmg

XA

A

B

B

AB

XAx

XX

XX

/5.3

/5/2/2

/25.0/5

|

Lmg

A

AB

X

XX

/7.1

)5.01(2

1

22

22

|

Ernest To 20090408

11

Page 12: Statistics in WR: Session 20

Can we do the same for any two points on the river?

Yes we can….

But under following conditions

1. Normality

2. 2nd order stationarity:– Mean does not change with location

– Variance does not change with location

3. Know the mean and variance.

4. Have a function that determines the correlation between two locations

AA

BB

µ = 5 mg/Lσ = 2 mg/L

Ernest To 20090408

12

Page 13: Statistics in WR: Session 20

Modeling correlationIn spatial statistics, correlation is modeled as a function of the separation distance between two points

Where h = separation distance (aka lag).

)(hfAB

Most of the time, correlation decreases with distance.

(Things that are closer together tend to be more correlated with each other).

Ernest To 20090408

13

Page 14: Statistics in WR: Session 20

Imagine the case where we have a smattering of data along an axis.

Any given pair of data points, i and j, will have two properties:

1.The semivariance = γ = 0.5*(Zi-Zj )2

2. The separation distance = hij

Estimating correlation model from data

Data point jMeasured value =Zj

Data point iMeasured value =Zi

hij = separation distance

Ernest To 20090408

14

Page 15: Statistics in WR: Session 20

We can plot the semivariance, γ , of all possible pairs against the lag, h. This gives us a variogram.

Estimating correlation model from data

Ernest To 20090408

15

Page 16: Statistics in WR: Session 20

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

Estimating correlation model from data

)(hf

Ernest To 20090408

16

Page 17: Statistics in WR: Session 20

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

Estimating correlation model from data

)(hf

range

sill

Ernest To 20090408

17

Page 18: Statistics in WR: Session 20

Assuming that mean and variance do not change with location (assumption of stationarity), the variogram model is related to the

covariance model by the equation:

Estimating correlation model from data

C(h)

)()( 2 hhC

Where σ2 is the variance

Ernest To 20090408

18

Page 19: Statistics in WR: Session 20

Assuming that variance does not change with location (assumption of stationarity), the correlation model is related to the

covariance model model by the equation :

Estimating correlation model from data

ρ(h)

2/)()( hCh

1

.8

.6

.4

.2

Ernest To 20090408

19

Page 20: Statistics in WR: Session 20

How does the correlation model affect the estimation

Scatter plotsof XA vs XB

Joint distribution of XA and XB

ρAB = 0.5ρAB = 0.5ρAB = 0.99ρAB = 0.99 ρAB = 0ρAB = 0

XA

XB

XB

XA

f(xA,xB)

XB|XA

AB XXf | Conditional distribution of XB|XA

Increasing h

XA

XB

XA

XB

XA

XB

XA

XB

Ernest To 20090408

20

Page 21: Statistics in WR: Session 20

Kriging

Page 22: Statistics in WR: Session 20

Multivariable caseWhat if we have more than one location that provide conditioning data?

(Assume distributions are STILL normal at all locations).•At station A1, A2, A3, A4

– µA1 = µA2 = µA3 = µA4 = 5 mg/L

– σA1 = σA2 = σA3 = σA4 = 2 mg/L

•At station B– mean DO = µB = 5 mg/L

– std dev at Station A= σB = 2 mg/L

•ρ =f(h)= 0.0125h2 - 0.225h + 1

AA22

BB

AA33

AA44

AA11

Ernest To 20090408

22

Page 23: Statistics in WR: Session 20

00.10.20.30.40.50.60.70.80.9

1

0 2 4 6 8 10

ρ

Separation distance, h

ρ=f(h )

Modeling correlation

AA22BB AA33AA44 AA11

Distance (s) matrixA1 A2 A3 A4 B

A1 0 2 4 6 8A2 2 0 2 4 6A3 4 2 0 2 4A4 6 4 2 0 2B 8 6 4 2 0

From correlation model:ρA1B = 0.0, ρA2B = 0.1, ρA3B = 0.3, ρA4B = 0.6; ρA1A2 = 0.6, ρA1A3 = 0.3, ρA1A4 = 0.1, ρA2A3 = 0.6, ρA2A4 =0.3 , ρA3A4 = 0.6

2 2 2 2

Distance along river (in hundred meters)

ρ =f(h)= 0.0125h2 - 0.225h + 1

Correlation matrixA1 A2 A3 A4 B

A1 1 0.6 0.3 0.1 0A2 0.6 1 0.6 0.3 0.1A3 0.3 0.6 1 0.6 0.3A4 0.1 0.3 0.6 1 0.6B 0 0.1 0.3 0.6 1

Ernest To 20090408

23

Page 24: Statistics in WR: Session 20

Dealing with multiple variablesDivide locations into two groups:

1. The vector, , representing the set of random variables at the locations contributing the conditioning data.

2. The variable, ,representing the random variable at the point of estimation.

AA22

BB

AA33

AA44

AA11AX

BX

Ernest To 20090408

24

Page 25: Statistics in WR: Session 20

Concept

XA1

XA2 XA3

XA4

XB

),,,,( 43214321 BAAAAXXXXX xxxxxfBAAAA

1. If individual distributions are normal, joint pdf is multi-normal.

Prior pdf

),( BAXX xxfBA

AA xX

AX

BX

AX

BX

Conditional pdf

2. Group variables into two: one for points with data, one for the point of estimation.

3. Intersect pdf with conditioning data to get conditional pdf.

Ernest To 20090408

25

Page 26: Statistics in WR: Session 20

Dealing with multiple variables

AA22

BB

AA33

AA44

AA11

AAABABAB XAXXXXX XCC 1

BA

BA

AB

AAAA

AAAA

AA

X

X

X

XX

XX

X

C

C

C

CC

CC

C

4

1

4414

4111

...,

ABAABABAB XXXXXX CCC 122

The updated mean and variance of the distribution at Station B are given by:

Mean:

Variance:

Where:

Ernest To 20090408

26

Page 27: Statistics in WR: Session 20

Conditional pdf

Equations in multivariable case are more generalized

A

A

B

BAB XAx

XABXXX X

|

22|

2 1 ABXXX AAB

AAABABAB XAXXXXX XCC

1

ABAABABAB XXXXXX CCC 122

Multivariable case

Recall two variable case

Multivariable case takes into account 1.Correlation between data locations and estimated location ( ).2.Correlation among data locations ( ).

This is the most fundamental form of kriging, i.e. Simple Kriging.

BAXC

AAXC

Ernest To 20090408

27

Page 28: Statistics in WR: Session 20

Plug and Chug

• Recall that Cov(A,B) = ρAB σA σ B

• Compute data to data correlation:

44.22.14.0

4.244.22.1

2.14.244.2

4.02.14.24

221226.0223.0221.0

226.0221226.0223.0

223.0226.0221226.0

221.0223.0226.0221

44441414

41411111

4414

4111

AAAAAAAA

AAAAAAAA

XX

XX

X

AAAA

AAAA

AA

CC

CC

C

Ernest To 20090408

28

Page 29: Statistics in WR: Session 20

Plug and Chug

4.2

2.1

4.0

0

226.0

223.0

221.0

220

...

4

1

BA

BA

AB

X

X

X

C

C

C

• Compute data to estimation point correlation:

Ernest To 20090408

29

Page 30: Statistics in WR: Session 20

Plug and Chug

5

5

5

5

47.012.0078.0043.05

5

5

5

5

44.22.14.0

4.244.22.1

2.14.244.2

4.02.14.24

4.22.14.005

4

3

2

1

4

3

2

1

1

1

A

A

A

A

A

A

A

A

XAXXXXX

X

X

X

X

X

X

X

X

XCCAAABABAB

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

weights

Ernest To 20090408

30

Page 31: Statistics in WR: Session 20

Plug and Chug

5

5

5

5

47.012.0078.0043.05

5

5

5

5

44.22.14.0

4.244.22.1

2.14.244.2

4.02.14.24

4.22.14.005

4

3

2

1

4

3

2

1

1

1

A

A

A

A

A

A

A

A

XAXXXXX

X

X

X

X

X

X

X

X

XCCAAABABAB

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

weights

Weights = [λ1, λ2, λ3,… λn]

Ernest To 20090408

31

Page 32: Statistics in WR: Session 20

Plug and Chug

Lmg

XXXX

AB

AB

XX

XX

AAAA

/2.3

52

55

53

52

65.0053.0053.0018.05

2 ,5 ,3 ,2

:data ngconditioni following Given the

4321

Ernest To 20090408

32

Page 33: Statistics in WR: Session 20

Plug and Chug

Lmg

CCC

AB

ABAABABAB

XX

XXXXXX

/3.2

05.5

2

2.1

4.0

0

422.14.0

2422.1

2.1242

4.02.124

4.26.14.002

1

2

122

Ernest To 20090408

33

Page 34: Statistics in WR: Session 20

Results from Simple KrigingThe updated mean and standard deviation of the distribution at Station B are:

Mean:

Standard deviation: AA22

BB

AA33

AA44

AA11

LmgAB XX

/88.3

LmgAB XX

/3.2

Ernest To 20090408

34

Page 35: Statistics in WR: Session 20

Other forms of kriging• Ordinary kriging (OK)

– Does not require mean to be known– Assumes that mean is constant and is somewhere in the range of the

conditioning data

• Universal kriging (UK)– Does not require mean to be known nor require it to be constant– User specifies a model for the trend in mean. UK will then fit the model to

the data.

• Indicator kriging (IK)– handles binary variables (0 or 1)– has ability to take care of non-normality in data through iterative

application.

• Co-kriging (CK)– takes into account a related secondary variable to help estimate the

primary variable.

Ernest To 20090408

35

Page 36: Statistics in WR: Session 20

Extension to 2D, 3D• The lag can be represented by the euclidean

distance between 2 points

• So the covariance model of the form, C = f(h), can still be used

• Variables may be more correlated in one direction than the other (anisotropy)– linear transformation can be performed to transform

the distances so the correlation distance is the same in all directions (isotropy)

212

212

212 )()()( zzyyxxh

c

zz

b

yy

a

xxh

212

212

212 )()()(

'

Ernest To 20090408

36

Page 37: Statistics in WR: Session 20

Extension to space-time• For space and time, there is no standard space-time

metric.• The form:

– is not always correct because the temporal and spatial axes are not always orthogonal to each other.

– Processes that happen in time usually have some dependency on processes that happen in space.

– (They are not independent).

• A separate temporal lag term is usually used

• The covariance function takes on the form:

212

212

212

212 )()()()( ttzzyyxxh

12 tt

),( hfC

Ernest To 20090408

37

Page 38: Statistics in WR: Session 20

Application(Gravity currents in Corpus

Christi Bay)

Page 39: Statistics in WR: Session 20

Sensors in Corpus Christi Bay

HRI stations

TCOON stations

USGS gages

TCEQ stations

Corpus Corpus Christi BayChristi Bay

Laguna Laguna MadreMadre

OsoOsoBayBay

Gulf of Gulf of MexicoMexico

SERF stations Aerial photo from Google Earth

Ernest To 20090408

39

Page 40: Statistics in WR: Session 20

DO

DODO DO

DO

Laguna Madre Corpus Christi Bay

1. Occurrence:Gravity currents emerge when wind and tide conditions are conducive.

Wind

4. Oxygen consumption:Dissolved oxygen in the pulse is depleted by benthic demand, sometimes to hypoxic levels.

2. Path:Gravity dominates the movement of the current. Current travels down-slope along bay bottom.

2. Wind Conditions:

• Mixing energy from the wind is transmitted down water column.

• The fluid at the top of the gravity current is entrained into the ambient fluid.

• Thickness of the bottom layer is reduced.

Ernest Sin Chit To, CRWR

Ernest To 20090408

40

Page 41: Statistics in WR: Session 20

Gravity currentemerges from Laguna Madre

Gravity currentdoes not emerge

O

Gravity current is broken upby wind before it reachespoint of interest

Gravity current reachespoint of interest

Dissolved oxygen has been depletedbelow 2mg/L

Dissolved oxygen is NOTdepletedbelow 2mg/L

Hypoxia

Point of interest is located withinpath of gravity current

Point of interest is outside path of gravity current

Occurrence TravelWind

conditions

O2

Consumption

No Hypoxia

Result

Ernest To 20090408

41

Page 42: Statistics in WR: Session 20

?

?

channel

depressions

?

ridges

Oso Bay

East LagunaMadre

West LagunaMadre

- 5.0 m above Mean High Water Level

- 4.5 m above Mean High Water Level

- 4.0 m above Mean High Water Level

- 3.5 m above Mean High Water Level

- 2.5 m above MeanHigh Water Level

- 2.0 m above Mean High Water Level

- 1.5 m above Mean High Water Level

- 1.0 m above Mean High Water Level

Selecting a study area

Ernest To 20090408

42

Page 43: Statistics in WR: Session 20

Downstream of East Laguna Madre

Plume tracking survey

July 14 to 17, 2006.

(While gravity current was on the move)Ben Hodges

University of Texas at Austin

Water quality data

July 12 and 18, 2006.

(At birth and demise of gravity current)Paul Montagna

Texas A&M University, Corpus ChristiErnest To 20090408

43

Page 44: Statistics in WR: Session 20

Synthesis of data

Time history of gravity current along direction of flow

Salinity profiles collected at various locations and time

Synthesis

Direction of flow

0

depth

salinity

0

depth

salinity0

depth

salinity

0

depth

salinity 0

depth

salinity 0

depth

salinity 0

depth

salinity

0

depth

salinity 0

depth

salinity 0

depth

salinity

0

depth

salinity 0

depth

salinity

t = 0 t = 1 t = 2 t = 3

Ernest To 20090408

44

Page 45: Statistics in WR: Session 20

HRI stations

HydroGet interface

Acquired data in ArcHydro II Time Series Table

Data Preparation1. Salinity data from HRI are acquired using HydroGet (a GIS web service client) and combined with plume tracking data.

3. Space-time kriging is performed in 3 dimensions

X= Longitudinal measure(meters from origin point)

Y =Time (days since 7/12/2006)

Z =Elevation (meters from water surface)

2. Data locations are projected onto a reference line following the general direction of flow.

Originx = 0 m

Ref

eren

ce li

ne

45Ernest To 20090408

Page 46: Statistics in WR: Session 20

Variogram along direction of flow

)1)(()( )(010

2

23ah

eCCChf

where h= lag distance along direction of flowC0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

Ernest To 20090408

46

Page 47: Statistics in WR: Session 20

Variogram along direction of flow

)1)(()( )(010

2

23ah

eCCChf

where h= lag distance along direction of flowC0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

sill

nugget

range

Ernest To 20090408

47

Page 48: Statistics in WR: Session 20

Variogram along depth

)1)(()( )(010

2

23ah

eCCChf

where h= lag distance along direction of flowC0= nugget = 0 psu2

C1= sill = 3.6 psu2

a = range = 1.7 m

(Gaussian variogram model)

Ernest To 20090408

48

Page 49: Statistics in WR: Session 20

Variogram along time axis

where h= lag distance along direction of flowC0= nugget = 0 psu2

C1= sill = 3 psu2

a = range = 1 day

(Spherical variogram model)

))(()( 3

3

223

010 ah

ahCCChf

Ernest To 20090408

49

Page 50: Statistics in WR: Session 20

Interpolation results

Distance to origin pointTime

Elevation Longitudinal profile on 7/13/2006 18:00

xy

z

Longitudinal profile on 7/12/2006 18:00

37 – 40 psu40 – 42 psu42 – 43 psu42 – 44 psu44 – 46 psu

LEGEND

N

N

Ernest To 20090408

50

Page 51: Statistics in WR: Session 20

Longitudinal Profiles

Ernest To 20090408

51

Page 52: Statistics in WR: Session 20

Bottom salinities

Ernest To 20090408

52

Page 53: Statistics in WR: Session 20

Cross validation• a common method to evaluate variogram models. • aka “fictitious point” method (Delhomme, 1978), • remove one data point at a time from data set and then using

the remaining n-1 points the estimate the removed point. • estimated and actual values were then compared with each

other. R² = 0.85

35

37

39

41

43

45

47

35 37 39 41 43 45 47

Estim

ated

sal

inity

(psu

)

Measured salinity (psu)Ernest To 20090408

53

Page 54: Statistics in WR: Session 20

Conclusions

We’ve covered:

1. Basics of spatial statistics

2. Kriging

3. Application of spatial-temporal statistics (Gravity currents in CCBay)

Spatial statistics is fun!

Ernest To 20090408

54

Page 55: Statistics in WR: Session 20

Geostatistical tools

• ArcGIS Geostatistical Analyst– Easiest to use

• GSLIB– Library of fortran programs

• DeCesare’s version of GSLIB– Modification of GSLIB to do space-time kriging

• BMELIB• Library of MATLAB programs

Ernest To 20090408

55