Frequency Analysis of Floods – A Nonparametric Approach Dr Santhosh Dronamraju Future Floods: An...

Frequency Analysis of Floods – A Nonparametric ApproachDr Santhosh Dronamraju

Future Floods: An Exploration of A Cross-Disciplinary Approach to Flood Risk Forecasting26-27 February 2015, Research Division Seminar room, Faculty of Arts and Social Sciences,NUS Kent Ridge Campus, Singapore

2Proprietary & Confidential

Contents

Section 1 Introduction to FFA Section 2 Kernel density estimators Section 3 Performance assessment using synthetic and

real world data sets. Section 4 Current and future work in Impact Forecasting


Introduction

Effective estimation of quantiles of hydrometeorological events (such as precipitation, droughts and floods) is of great scientific interest, as it forms basis for planning, design and management of water-resources systems.

Estimates of flood quantiles have wide applications

– Design and risk assessment of water control structures

– Design of critical features of land fill covers and erosion protection for hazardous wastes

– Economic evaluation of flood protection projects, flood insurance assessment

– Land use planning and management, and operation of irrigation projects.


Flood Frequency Analysis (FFA)

In FFA, a unique relation between flood magnitude and the corresponding recurrence interval is sought

The objective of frequency analysis in a hydrologic context is to infer (from observed data) the probability that event of certain magnitude will be exceeded

Two basic problems exist for most hydrologic applications.

– First the sample is usually small, by statistical standards, resulting in uncertainty as to the true probability

– A single theoretical frequency distribution does not always fit a particular data-type


Univariate FFA

Parametric

In conventional methods of flood frequency analysis, the marginal distribution functions of peak flow, volume and duration are assumed to follow some specific family of parametric distribution functions

for example

Normal(2p) eg: Slack et al. (1975)

Log-normal(2p) eg: Chow (1959)

Log-normal(3p) eg: Hoshi et al. (1989)

Gamma(2p) eg: Kite (1977)

Pearson(3p) eg: Bobeé (1973)

Log-Pearson(3p) eg: Pilon and Adamowski (1993)

Generalized Extreme Value(3p) eg: Lu and Stedinger (1992)

Generalized Pareto(3p) eg: Wang (1991)

Generalized Logistic(3p) eg: Ahmed et al. (1987)


Disadvantages

• Uncertainty in selecting frequency distribution

• Uncertainty in method of estimating parameters (method of moments, maximum likelihood, probability weighted moments)

• Assumptions associated with parametric approach sometimes result in strongly biased estimates of the high quantiles when the variable of interest has a bimodal PDF


Common Nonparametric Density Estimation Methods

Nearest neighbor method or balloon density

– May not lead to valid PDF

– Suitable if we are to find probability at single point

Maximum penalized likelihood estimators

– Difficult to apply for discrete data

Orthogonal series estimators (Karmakar and Simonovic, 2009)

– May not be a bonafide density

– Data must be independent

Kernel density estimators


Kernel density estimators (KDE)

Kernel density estimators– KDE belong to a class of estimators called non-parametric density

estimators – KDE have no fixed structure and depend upon all the data points to make

an estimate – kernel estimators centre a kernel function at each data point– Smooth kernel function can be chosen as building block, to have a smooth

density estimate

Basic form of KDE

Characteristics– Effective in multi-modal data representation– Can consider noise in observed data

1

1 1( )

ni

i i i

x xf x K

n h h


Choice

Shape of kernel

Bandwidth

bandwidth

Typical kernels

Quadratic

Triangular

Components of KDEs


Uniform

Triangular

Epanechnikov

Biweight

Triweight (tricube)

Gaussian

Cosine

KDE : Selection of kernel

A kernel is a non-negative real-valued integrable function K satisfying the following two requirements:

The first requirement ensures that the method of kernel density estimation results in a probability density function

The second requirement ensures that the average of the corresponding distribution is equal to true PDF of the sample usedThe performance of KDE is

insensitive to the choice of kernel


KDE : Estimation of bandwidth

The selection of bandwidth is an important step in kernel estimation method. A change in bandwidth may dramatically modify the shape of the estimated PDF (Adamowski, 1996; Efromovich, 1999)

Methods for optimum bandwidth selection– MISE : Mean Integrated Squared Error – AMISE: Asymptotic Mean Integrated Squared

Error

Plug-in estimates– The optimal choice for bandwidth, an overall measure of the effectiveness of

PDF, is provided by the mean integrated squared error (MISE), described by the following equation (Bowman and Azzalini, 1997; Kim et al., 2003):

Where S = sample standard deviationIQR = inter quartile rangen = sample size

h𝑜𝑝𝑡= (1.587 )∗𝑚𝑖𝑛{𝑆 ,( 𝐼𝑄𝑅1.349 )}∗𝑛− 1/3

x

Probabil

ity


KDE : issues

Issues

– Boundary leakage problems

– Normal reference rule

Solution by Botev et al. (2010) based on diffusion

x

Probabili

ty


Performance Assessment based on synthetic samples

The performance of D-kde was assessed using two sets of synthetic datasets – Monte-Carlo experiments with unimodal populations– Monte-Carlo experiments with bimodal populations


Unimodal Populations considered for performance assessment

Populations (DIST)

– Generalized extreme value (GEV)

– Generalized logistic (GLO)

– Generalized normal (LN3)

– Generalized pareto (GPA)

Samples each of size n (=50, 75, 100 and 200)

L-moments based approach with pairs [(0.2, 0.1), (0.3, 0.2), (0.4, 0.3) and (0.5, 0.4)] (Viglione et al. ,2007)

3, , ,DIST n 64 combinations


Comparison of D-kde with other nonparametric methods

Classical Gaussian kernel estimator (G)

Boundary Epanechnikov kernel (M), Gaussian kernel estimator with

boundary correction (B), Generalized Birnbaum–Saunders

kernel density estimator (K)

Botev-Grotowski-Kroese estimator (BGKE) used in D-kde

Silverman's rule of thumb (ROT) Altman and Leger estimator

(ALE) Bowman estimator (BE) Polansky and Baker estimator

(PBE) Sheather and Jones estimator

(SJE) Scott and Terrell biased estimator

(STBE), and Scott and Terrell unbiased

estimator (STUE)

32kde and bandwidth estimator

combinations


D K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

BGKE ROT ALE BE PBE SJE STBE STUE

NM

SE

Population: GEV, n = 50


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


NM

SE



0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


NM

SE



0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


NM

SE


Variation in one-thousand NMSE values, each resulting from comparison of theoretical (population) PDF with PDF constructed for each of the one-thousand samples drawn from unimodal Generalized extreme value (GEV) population. Along abscissa abbreviations are shown for D-kde (D), and each of the four kernels (K, B, M and G) that are considered in conjunction with eight bandwidth estimators (BGKE, ROT, ALE, BE, PBE, SJE, STBE and STUE) for construction of PDF for a sample.


Bimodal Populations considered for performance assessment

Populations (DIST)

– Generalized extreme value (GEV).

– Generalized logistic (GLO) .

– Generalized normal (LN3).

– Generalized pareto (GPA).

Bimodal populations

– Mixture of Unimodal populations.

, .

Samples each of size n (=50, 75, 100 and 200).

L-moments based approach with pairs [(0.2, 0.1), (0.3, 0.2), (0.4, 0.3) and (0.5, 0.4)]. (Viglione et al. ,2007)

288 combinations

3-1, -2, , , ,DIST DIST n


Performance Assessment based on real world data

The performance and applicability of D-kde was assessed using four real world datasets from:– India– USA– United Kingdom– Canada


Study area - INDIA


Study area - USA


Tests for stationarity and independence in Annual maximum series at each site

The following tests were performed at each site for annual maximum discharge

Stationarity tests– KPSS test for trend and level stationarity (KPSS and KPSS_level)– Spearman’s-rho test for trend stationarity (S-rho)– Mann-Kendall test for trend stationarity (Mken)– Augmented Dickey Fuller test for trend stationarity (ADS)

Independent and identically distributed (IID) test– BDS test (BDS) upto 5 dimensions


D-kde (D), Generalized logistic (LO), Generalized Normal (NO) and Generalized extreme value (EV), and each of the four kernels (K, B, M and G) respectively.

PDFs constructed for POT streamflows at Tay, UK


US UK INDIA

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

0.06

0.065

0.07

R-R

MS

E

US UK INDIA

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

NS

US UK INDIA0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

R-b

ias

Leave one out cross validation : error measures


-10 0 10 20 30 40 500

0.01

0.02

0.03

0.04

0.05

0.06

POT

Pro

babi

lity

D-kde

G

POT data

Boundary leakage


Current and future work in IF

Stochastic simulation Selection of parametric distributions Ease of expansion into multivariate domain ( flood peak and flood

duration Better representation of marginals for some multivariate models

(copulas)


Summary and Conclusions

A linear diffusion process based adaptive kernel (D-kde) density

estimator which avoids boundary leakage problem is applied for

frequency analysis of floods and its potential is demonstrated by

application to synthetic and real world data sets.

The bandwidth is computed by a new plug-in bandwidth selection

strategy, which avoids normal reference rule and its performance is

compared with various bandwidth estimators.

The performance of D-kde was found to be better than conventional

methods, irrespective of the nature of population and sample size. The

performance improved with increase in sample size.


Contacts

Santhosh Dronamraju

Impact Forecasting

+91 80 3091 8144

[email protected]

mailto:[email protected]

mailto:[email protected]


DisclaimerLegal Disclaimer

© Aon UK Limited trading as Aon Benfield (for itself and on behalf of each subsidiary company of Aon Plc) (“Aon Benfield”) reserves all rights to the content of this report or document (“Report”). This Report is for distribution to Aon Benfield and the organisation to which it was originally delivered by Aon Benfield only (the “Recipient”). Copies may be made by that organisation for its own internal purposes but this Report may not be distributed in whole or in part to any third party without both (i) the prior written consent of Aon Benfield and (ii) the third party having first signed a “recipient of report” letter in a form acceptable to Aon Benfield. This Report is provided as a courtesy to the recipient and for general information and marketing purposes only. The Report should not be construed as giving opinions, assessment of risks or advice of any kind (including but not limited to actuarial, re/insurance, tax, regulatory or legal advice). The content of this Report is made available without warranty of any kind and without any other assurance whatsoever as to its completeness or accuracy.

Aon Benfield does not accept any liability to any Recipient or third party as a result of any reliance placed by such party on this Report. Any decision to rely on the contents of this Report is entirely the responsibility of the Recipient. The Recipient acknowledges that this Report does not replace the need for the Recipient to undertake its own assessment or seek independent and/or specialist risk assessment and/or other relevant advice.

The contents of this Report are based on publically available information and/or third party sources (the “Data”) in respect of which Aon Benfield has no control and such information has not been verified by Aon Benfield. This Data may have been subjected to mathematical and/or empirical analysis and modelling in producing the Report. The Recipient acknowledges that any form of mathematical and/or empirical analysis and modelling (including that used in the preparation of this Report) may produce results which differ from actual events or losses.

Limitations of Catastrophe Models

This report includes information that is output from catastrophe models of Impact Forecasting, LLC (IF). The information from the models is provided by Aon Benfield Services, Inc. (Aon Benfield) under the terms of its license agreements with IF. The results in this report from IF are the products of the exposures modelled, the financial assumptions made concerning deductibles and limits, and the risk models that project the pounds of damage that may be caused by defined catastrophe perils. Aon Benfield recommends that the results from these models in this report not be relied upon in isolation when making decisions that may affect the underwriting appetite, rate adequacy or solvency of the company. The IF models are based on scientific data, mathematical and empirical models, and the experience of engineering, geological and meteorological experts. Calibration of the models using actual loss experience is based on very sparse data, and material inaccuracies in these models are possible. The loss probabilities generated by the models are not predictive of future hurricanes, other windstorms, or earthquakes or other natural catastrophes, but provide estimates of the magnitude of losses that may occur in the event of such natural catastrophes. Aon Benfield makes no warranty about the accuracy of the IF models and has made no attempt to independently verify them. Aon Benfield will not be liable for any special, indirect or consequential damages, including, without limitation, losses or damages arising from or related to any use of or decisions based upon data developed using the models of IF.

Additional Limitations of Impact Forecasting, LLC

The results listed in this report are based on engineering / scientific analysis and data, information provided by the client, and mathematical and empirical models. The accuracy of the results depends on the uncertainty associated with each of these areas. In particular, as with any model, actual losses may differ from the results of simulations. It is only possible to provide plausible results based on complete and accurate information provided by the client and other reputable data sources. Furthermore, this information may only be used for the business application specified by Impact Forecasting, LLC and for no other purpose. It may not be used to support development of or calibration of a product or service offering that competes with Impact Forecasting, LLC. The information in this report may not be used as a part of or as a source for any insurance rate filing documentation.

THIS INFORMATION IS PROVIDED “AS IS” AND IMPACT FORECASTING, LLC HAS NOT MADE AND DOES NOT MAKE ANY WARRANTY OF ANY KIND WHATSOEVER, EXPRESS OR IMPLIED, WITH RESPECT TO THIS REPORT; AND ALL WARRANTIES INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE HEREBY DISCLAIMED BY IMPACT FORECASTING, LLC. IMPACT FORECASTING, LLC WILL NOT BE LIABLE TO ANYONE WITH RESPECT TO ANY DAMAGES, LOSS OR CLAIM WHATSOEVER, NO MATTER HOW OCCASIONED, IN CONNECTION WITH THE PREPARATION OR USE OF THIS REPORT.


Comparison of D-kde with other nonparametric methods based on qauntiles

Quantiles corresponding to eight return periods (T = 10, 25, 50, 75, 100, 200, 500 and 1000 years)

Classical Gaussian kernel estimator (G),

Boundary Epanechnikov kernel (M), Gaussian kernel estimator with

boundary correction (B), Generalized Birnbaum–Saunders

kernel density estimator (K) Local polynomial–based estimator (L)

Botev-Grotowski-Kroese estimator (BGKE) used in D-kde

Silverman's rule of thumb (ROT) Altman and Leger estimator

(ALE) Bowman estimator (BE) Polansky and Baker estimator

(PBE) Sheather and Jones estimator

(SJE) Scott and Terrell biased estimator

(STBE), and Scott and Terrell unbiased

estimator (STUE)33kde and bandwidth estimator

combinations


D L K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


NM

SE



0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


NM

SE



0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


NM

SE



0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


NM

SE


Variation in one-thousand NMSE values, each resulting from comparison of quantiles estimated based on population CDF with those estimated from CDF corresponding to each of the one-thousand samples drawn from unimodal GEV population. Along abscissa abbreviations are shown for D-kde (D), Local polynomial-based estimator (L), and each of the four kernels (K, B, M and G) that are considered in conjunction with eight bandwidth estimators (BGKE, ROT, ALE, BE, PBE, SJE, STBE and STUE) for construction of CDF for a sample.


EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GPA, GEV]


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GPA, GLO]


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GPA, LN3]


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GEV, GLO]


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GEV, LN3]


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

n=50 n=75 n=100 n=200

KS

sta

tistic

[ 0.3, GLO, LN3]

Variation in one-thousand KS test statistic values, each resulting from application of KS goodness-of-fit test for comparison of empirical CDF (corresponding to known bimodal population) with CDF constructed for each of the one-thousand samples drawn from the population. The method considered for construction of CDF is represented by EV (Generalized extreme value), LO (Generalized logistic), PA (Generalized Pareto), N (Generalized Normal) and D (D-kde). Title of each sub-plot indicates [α , distribution-1, distribution-2] corresponding to each population.

Frequency Analysis of Floods – A Nonparametric Approach Dr Santhosh Dronamraju Future Floods: An...

Documents

Transcript of Frequency Analysis of Floods – A Nonparametric Approach Dr Santhosh Dronamraju Future Floods: An...