Frequency Analysis of Floods – A Nonparametric Approach Dr Santhosh Dronamraju Future Floods: An...
-
Upload
deirdre-baldwin -
Category
Documents
-
view
227 -
download
1
Transcript of Frequency Analysis of Floods – A Nonparametric Approach Dr Santhosh Dronamraju Future Floods: An...
Frequency Analysis of Floods – A Nonparametric ApproachDr Santhosh Dronamraju
Future Floods: An Exploration of A Cross-Disciplinary Approach to Flood Risk Forecasting26-27 February 2015, Research Division Seminar room, Faculty of Arts and Social Sciences,NUS Kent Ridge Campus, Singapore
2Proprietary & Confidential
Contents
Section 1 Introduction to FFA Section 2 Kernel density estimators Section 3 Performance assessment using synthetic and
real world data sets. Section 4 Current and future work in Impact Forecasting
3Proprietary & Confidential
Introduction
Effective estimation of quantiles of hydrometeorological events (such as precipitation, droughts and floods) is of great scientific interest, as it forms basis for planning, design and management of water-resources systems.
Estimates of flood quantiles have wide applications
– Design and risk assessment of water control structures
– Design of critical features of land fill covers and erosion protection for hazardous wastes
– Economic evaluation of flood protection projects, flood insurance assessment
– Land use planning and management, and operation of irrigation projects.
4Proprietary & Confidential
Flood Frequency Analysis (FFA)
In FFA, a unique relation between flood magnitude and the corresponding recurrence interval is sought
The objective of frequency analysis in a hydrologic context is to infer (from observed data) the probability that event of certain magnitude will be exceeded
Two basic problems exist for most hydrologic applications.
– First the sample is usually small, by statistical standards, resulting in uncertainty as to the true probability
– A single theoretical frequency distribution does not always fit a particular data-type
5Proprietary & Confidential
Univariate FFA
Parametric
In conventional methods of flood frequency analysis, the marginal distribution functions of peak flow, volume and duration are assumed to follow some specific family of parametric distribution functions
for example
Normal(2p) eg: Slack et al. (1975)
Log-normal(2p) eg: Chow (1959)
Log-normal(3p) eg: Hoshi et al. (1989)
Gamma(2p) eg: Kite (1977)
Pearson(3p) eg: Bobeé (1973)
Log-Pearson(3p) eg: Pilon and Adamowski (1993)
Generalized Extreme Value(3p) eg: Lu and Stedinger (1992)
Generalized Pareto(3p) eg: Wang (1991)
Generalized Logistic(3p) eg: Ahmed et al. (1987)
6Proprietary & Confidential
Disadvantages
• Uncertainty in selecting frequency distribution
• Uncertainty in method of estimating parameters (method of moments, maximum likelihood, probability weighted moments)
• Assumptions associated with parametric approach sometimes result in strongly biased estimates of the high quantiles when the variable of interest has a bimodal PDF
7Proprietary & Confidential
Common Nonparametric Density Estimation Methods
Nearest neighbor method or balloon density
– May not lead to valid PDF
– Suitable if we are to find probability at single point
Maximum penalized likelihood estimators
– Difficult to apply for discrete data
Orthogonal series estimators (Karmakar and Simonovic, 2009)
– May not be a bonafide density
– Data must be independent
Kernel density estimators
8Proprietary & Confidential
Kernel density estimators (KDE)
Kernel density estimators– KDE belong to a class of estimators called non-parametric density
estimators – KDE have no fixed structure and depend upon all the data points to make
an estimate – kernel estimators centre a kernel function at each data point– Smooth kernel function can be chosen as building block, to have a smooth
density estimate
Basic form of KDE
Characteristics– Effective in multi-modal data representation– Can consider noise in observed data
1
1 1( )
ni
i i i
x xf x K
n h h
9Proprietary & Confidential
Choice
Shape of kernel
Bandwidth
bandwidth
Typical kernels
Quadratic
Triangular
Components of KDEs
10Proprietary & Confidential
Uniform
Triangular
Epanechnikov
Biweight
Triweight (tricube)
Gaussian
Cosine
KDE : Selection of kernel
A kernel is a non-negative real-valued integrable function K satisfying the following two requirements:
The first requirement ensures that the method of kernel density estimation results in a probability density function
The second requirement ensures that the average of the corresponding distribution is equal to true PDF of the sample usedThe performance of KDE is
insensitive to the choice of kernel
11Proprietary & Confidential
KDE : Estimation of bandwidth
The selection of bandwidth is an important step in kernel estimation method. A change in bandwidth may dramatically modify the shape of the estimated PDF (Adamowski, 1996; Efromovich, 1999)
Methods for optimum bandwidth selection– MISE : Mean Integrated Squared Error – AMISE: Asymptotic Mean Integrated Squared
Error
Plug-in estimates– The optimal choice for bandwidth, an overall measure of the effectiveness of
PDF, is provided by the mean integrated squared error (MISE), described by the following equation (Bowman and Azzalini, 1997; Kim et al., 2003):
Where S = sample standard deviationIQR = inter quartile rangen = sample size
h𝑜𝑝𝑡= (1.587 )∗𝑚𝑖𝑛{𝑆 ,( 𝐼𝑄𝑅1.349 )}∗𝑛− 1/3
x
Probabil
ity
12Proprietary & Confidential
KDE : issues
Issues
– Boundary leakage problems
– Normal reference rule
Solution by Botev et al. (2010) based on diffusion
x
Probabili
ty
13Proprietary & Confidential
Performance Assessment based on synthetic samples
The performance of D-kde was assessed using two sets of synthetic datasets – Monte-Carlo experiments with unimodal populations– Monte-Carlo experiments with bimodal populations
14Proprietary & Confidential
Unimodal Populations considered for performance assessment
Populations (DIST)
– Generalized extreme value (GEV)
– Generalized logistic (GLO)
– Generalized normal (LN3)
– Generalized pareto (GPA)
Samples each of size n (=50, 75, 100 and 200)
L-moments based approach with pairs [(0.2, 0.1), (0.3, 0.2), (0.4, 0.3) and (0.5, 0.4)] (Viglione et al. ,2007)
3, , ,DIST n 64 combinations
15Proprietary & Confidential
Comparison of D-kde with other nonparametric methods
Classical Gaussian kernel estimator (G)
Boundary Epanechnikov kernel (M), Gaussian kernel estimator with
boundary correction (B), Generalized Birnbaum–Saunders
kernel density estimator (K)
Botev-Grotowski-Kroese estimator (BGKE) used in D-kde
Silverman's rule of thumb (ROT) Altman and Leger estimator
(ALE) Bowman estimator (BE) Polansky and Baker estimator
(PBE) Sheather and Jones estimator
(SJE) Scott and Terrell biased estimator
(STBE), and Scott and Terrell unbiased
estimator (STUE)
32kde and bandwidth estimator
combinations
16Proprietary & Confidential
D K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 50
D K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 75
D K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 100
D K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 200
Variation in one-thousand NMSE values, each resulting from comparison of theoretical (population) PDF with PDF constructed for each of the one-thousand samples drawn from unimodal Generalized extreme value (GEV) population. Along abscissa abbreviations are shown for D-kde (D), and each of the four kernels (K, B, M and G) that are considered in conjunction with eight bandwidth estimators (BGKE, ROT, ALE, BE, PBE, SJE, STBE and STUE) for construction of PDF for a sample.
17Proprietary & Confidential
Bimodal Populations considered for performance assessment
Populations (DIST)
– Generalized extreme value (GEV).
– Generalized logistic (GLO) .
– Generalized normal (LN3).
– Generalized pareto (GPA).
Bimodal populations
– Mixture of Unimodal populations.
, .
Samples each of size n (=50, 75, 100 and 200).
L-moments based approach with pairs [(0.2, 0.1), (0.3, 0.2), (0.4, 0.3) and (0.5, 0.4)]. (Viglione et al. ,2007)
288 combinations
3-1, -2, , , ,DIST DIST n
18Proprietary & Confidential
Performance Assessment based on real world data
The performance and applicability of D-kde was assessed using four real world datasets from:– India– USA– United Kingdom– Canada
19Proprietary & Confidential
Study area - INDIA
20Proprietary & Confidential
Study area - USA
21Proprietary & Confidential
Tests for stationarity and independence in Annual maximum series at each site
The following tests were performed at each site for annual maximum discharge
Stationarity tests– KPSS test for trend and level stationarity (KPSS and KPSS_level)– Spearman’s-rho test for trend stationarity (S-rho)– Mann-Kendall test for trend stationarity (Mken)– Augmented Dickey Fuller test for trend stationarity (ADS)
Independent and identically distributed (IID) test– BDS test (BDS) upto 5 dimensions
22Proprietary & Confidential
D-kde (D), Generalized logistic (LO), Generalized Normal (NO) and Generalized extreme value (EV), and each of the four kernels (K, B, M and G) respectively.
PDFs constructed for POT streamflows at Tay, UK
23Proprietary & Confidential
US UK INDIA
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0.065
0.07
R-R
MS
E
US UK INDIA
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
NS
US UK INDIA0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
R-b
ias
Leave one out cross validation : error measures
24Proprietary & Confidential
-10 0 10 20 30 40 500
0.01
0.02
0.03
0.04
0.05
0.06
POT
Pro
babi
lity
D-kde
G
POT data
Boundary leakage
25Proprietary & Confidential
Current and future work in IF
Stochastic simulation Selection of parametric distributions Ease of expansion into multivariate domain ( flood peak and flood
duration Better representation of marginals for some multivariate models
(copulas)
26Proprietary & Confidential
Summary and Conclusions
A linear diffusion process based adaptive kernel (D-kde) density
estimator which avoids boundary leakage problem is applied for
frequency analysis of floods and its potential is demonstrated by
application to synthetic and real world data sets.
The bandwidth is computed by a new plug-in bandwidth selection
strategy, which avoids normal reference rule and its performance is
compared with various bandwidth estimators.
The performance of D-kde was found to be better than conventional
methods, irrespective of the nature of population and sample size. The
performance improved with increase in sample size.
27Proprietary & Confidential
Contacts
Santhosh Dronamraju
Impact Forecasting
+91 80 3091 8144
28Proprietary & Confidential
DisclaimerLegal Disclaimer
© Aon UK Limited trading as Aon Benfield (for itself and on behalf of each subsidiary company of Aon Plc) (“Aon Benfield”) reserves all rights to the content of this report or document (“Report”). This Report is for distribution to Aon Benfield and the organisation to which it was originally delivered by Aon Benfield only (the “Recipient”). Copies may be made by that organisation for its own internal purposes but this Report may not be distributed in whole or in part to any third party without both (i) the prior written consent of Aon Benfield and (ii) the third party having first signed a “recipient of report” letter in a form acceptable to Aon Benfield. This Report is provided as a courtesy to the recipient and for general information and marketing purposes only. The Report should not be construed as giving opinions, assessment of risks or advice of any kind (including but not limited to actuarial, re/insurance, tax, regulatory or legal advice). The content of this Report is made available without warranty of any kind and without any other assurance whatsoever as to its completeness or accuracy.
Aon Benfield does not accept any liability to any Recipient or third party as a result of any reliance placed by such party on this Report. Any decision to rely on the contents of this Report is entirely the responsibility of the Recipient. The Recipient acknowledges that this Report does not replace the need for the Recipient to undertake its own assessment or seek independent and/or specialist risk assessment and/or other relevant advice.
The contents of this Report are based on publically available information and/or third party sources (the “Data”) in respect of which Aon Benfield has no control and such information has not been verified by Aon Benfield. This Data may have been subjected to mathematical and/or empirical analysis and modelling in producing the Report. The Recipient acknowledges that any form of mathematical and/or empirical analysis and modelling (including that used in the preparation of this Report) may produce results which differ from actual events or losses.
Limitations of Catastrophe Models
This report includes information that is output from catastrophe models of Impact Forecasting, LLC (IF). The information from the models is provided by Aon Benfield Services, Inc. (Aon Benfield) under the terms of its license agreements with IF. The results in this report from IF are the products of the exposures modelled, the financial assumptions made concerning deductibles and limits, and the risk models that project the pounds of damage that may be caused by defined catastrophe perils. Aon Benfield recommends that the results from these models in this report not be relied upon in isolation when making decisions that may affect the underwriting appetite, rate adequacy or solvency of the company. The IF models are based on scientific data, mathematical and empirical models, and the experience of engineering, geological and meteorological experts. Calibration of the models using actual loss experience is based on very sparse data, and material inaccuracies in these models are possible. The loss probabilities generated by the models are not predictive of future hurricanes, other windstorms, or earthquakes or other natural catastrophes, but provide estimates of the magnitude of losses that may occur in the event of such natural catastrophes. Aon Benfield makes no warranty about the accuracy of the IF models and has made no attempt to independently verify them. Aon Benfield will not be liable for any special, indirect or consequential damages, including, without limitation, losses or damages arising from or related to any use of or decisions based upon data developed using the models of IF.
Additional Limitations of Impact Forecasting, LLC
The results listed in this report are based on engineering / scientific analysis and data, information provided by the client, and mathematical and empirical models. The accuracy of the results depends on the uncertainty associated with each of these areas. In particular, as with any model, actual losses may differ from the results of simulations. It is only possible to provide plausible results based on complete and accurate information provided by the client and other reputable data sources. Furthermore, this information may only be used for the business application specified by Impact Forecasting, LLC and for no other purpose. It may not be used to support development of or calibration of a product or service offering that competes with Impact Forecasting, LLC. The information in this report may not be used as a part of or as a source for any insurance rate filing documentation.
THIS INFORMATION IS PROVIDED “AS IS” AND IMPACT FORECASTING, LLC HAS NOT MADE AND DOES NOT MAKE ANY WARRANTY OF ANY KIND WHATSOEVER, EXPRESS OR IMPLIED, WITH RESPECT TO THIS REPORT; AND ALL WARRANTIES INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE HEREBY DISCLAIMED BY IMPACT FORECASTING, LLC. IMPACT FORECASTING, LLC WILL NOT BE LIABLE TO ANYONE WITH RESPECT TO ANY DAMAGES, LOSS OR CLAIM WHATSOEVER, NO MATTER HOW OCCASIONED, IN CONNECTION WITH THE PREPARATION OR USE OF THIS REPORT.
29Proprietary & Confidential
Comparison of D-kde with other nonparametric methods based on qauntiles
Quantiles corresponding to eight return periods (T = 10, 25, 50, 75, 100, 200, 500 and 1000 years)
Classical Gaussian kernel estimator (G),
Boundary Epanechnikov kernel (M), Gaussian kernel estimator with
boundary correction (B), Generalized Birnbaum–Saunders
kernel density estimator (K) Local polynomial–based estimator (L)
Botev-Grotowski-Kroese estimator (BGKE) used in D-kde
Silverman's rule of thumb (ROT) Altman and Leger estimator
(ALE) Bowman estimator (BE) Polansky and Baker estimator
(PBE) Sheather and Jones estimator
(SJE) Scott and Terrell biased estimator
(STBE), and Scott and Terrell unbiased
estimator (STUE)33kde and bandwidth estimator
combinations
30Proprietary & Confidential
D L K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 50
D L K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 75
D L K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 100
D L K B M G K B M G K B M G K B M G K B M G K B M G K B M G K B M G0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
BGKE ROT ALE BE PBE SJE STBE STUE
NM
SE
Population: GEV, n = 200
Variation in one-thousand NMSE values, each resulting from comparison of quantiles estimated based on population CDF with those estimated from CDF corresponding to each of the one-thousand samples drawn from unimodal GEV population. Along abscissa abbreviations are shown for D-kde (D), Local polynomial-based estimator (L), and each of the four kernels (K, B, M and G) that are considered in conjunction with eight bandwidth estimators (BGKE, ROT, ALE, BE, PBE, SJE, STBE and STUE) for construction of CDF for a sample.
31Proprietary & Confidential
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GPA, GEV]
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GPA, GLO]
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GPA, LN3]
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GEV, GLO]
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GEV, LN3]
EV LO PA N D EV LO PA N D EV LO PA N D EV LO PA N D0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
n=50 n=75 n=100 n=200
KS
sta
tistic
[ 0.3, GLO, LN3]
Variation in one-thousand KS test statistic values, each resulting from application of KS goodness-of-fit test for comparison of empirical CDF (corresponding to known bimodal population) with CDF constructed for each of the one-thousand samples drawn from the population. The method considered for construction of CDF is represented by EV (Generalized extreme value), LO (Generalized logistic), PA (Generalized Pareto), N (Generalized Normal) and D (D-kde). Title of each sub-plot indicates [α , distribution-1, distribution-2] corresponding to each population.