HW#1: Extreme Rainfall Deficit in Singapore#
Objectives
This homework will help you gain a better understanding in terms of the ways how to:
Fit Generalized Extreme Value (GEV) distribution
Estimate the return level of extreme rainfall deficit
Happy coding!
Submission Guide
Deadline: Sunday 11:59 pm, 5th November 2023 (Note: Late submissions will not be accepted).
Please upload your solutions to Canvas in a Jupyter Notebook format with the name “Homework1_StudentID.ipynb”. Make sure to write down your student ID and full name in the cell below.
For any questions, feel free to contact Prof. Xiaogang HE (hexg@nus.edu.sg), or Zhixiao NIU (niu.zhixiao@u.nus.edu).
## Fill your student ID and full name below.
# Student ID:
# Full name:
Data: You will need to use the historical (1981-2020) daily total rainfall at Singapore’s Changi station for this homework.
You can create a DataFrame using Pandas by reading file “../../assets/data/Changi_daily_rainfall.csv”.
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from scipy.optimize import fsolve
from scipy.stats import genextreme as gev
Q1: Calculate extreme rainfall statisitics#
First, divide the data into the following seasons: DJF (Dec-Jan-Feb), MAM (Mar-Apr-May), JJA (Jun-Jul-Aug), and SON (Sep-Oct-Nov) using Pandas’ filtering
methods. Calculate the following statistics of daily rainfall for each season: (i) mean; (ii) variance; (iii) skewness; and (iv) kurtosis. Based on the results, identify the driest season. (Details on the filtering method can be found in the Pandas tutorial). (Hint: The driest season can refer to the season with the lowest mean of daily rainfall) (10 marks)
rain = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col = 0, header = 0, parse_dates = True)
display(rain.head(10))
Daily Rainfall Total (mm) | |
---|---|
Date | |
1981-01-01 | 0.0 |
1981-01-02 | 0.0 |
1981-01-03 | 0.0 |
1981-01-04 | 0.0 |
1981-01-05 | 0.0 |
1981-01-06 | 0.3 |
1981-01-07 | 51.4 |
1981-01-08 | 0.0 |
1981-01-09 | 4.4 |
1981-01-10 | 0.0 |
# divide the data into 4 seasons
MAM = rain.loc[(rain.index.month > 2) & (rain.index.month < 6)]
JJA = rain.loc[(rain.index.month > 5) & (rain.index.month < 9)]
SON = rain.loc[(rain.index.month > 8) & (rain.index.month < 12)]
DJF = rain.loc[(rain.index.month < 3) | (rain.index.month > 11)]
stat = pd.DataFrame([[DJF.mean().values[0],DJF.std().values[0]**2,DJF.skew().values[0],DJF.kurtosis().values[0]],
[MAM.mean().values[0],MAM.std().values[0]**2,MAM.skew().values[0],MAM.kurtosis().values[0]],
[JJA.mean().values[0],JJA.std().values[0]**2,JJA.skew().values[0],JJA.kurtosis().values[0]],
[SON.mean().values[0],SON.std().values[0]**2,SON.skew().values[0],SON.kurtosis().values[0]]
],
index = ['DJF', 'MAM', 'JJA', 'SON'],
columns = ['Mean', 'Variance', 'Skewness', 'Kurtosis']
)
display(stat)
print('The driest season is JJA (Jun-Jul-Aug).')
Mean | Variance | Skewness | Kurtosis | |
---|---|---|---|---|
DJF | 7.159723 | 336.331229 | 4.907595 | 33.528282 |
MAM | 5.117065 | 141.739578 | 3.980410 | 22.942881 |
JJA | 4.575788 | 129.392105 | 5.052307 | 41.294758 |
SON | 6.065027 | 197.342220 | 4.930091 | 38.885941 |
The driest season is JJA (Jun-Jul-Aug).
Q2: Fit the GEV distribution#
For the driest season you identified in Q1, find the seasonal maximum rainfall deficit based on the 30-day moving average rainfall deficit (please use centered moving average). This will result in a data set of 40 values, one value for each year. Fit two GEV distributions of seasonal maximum rainfall deficit using data from the first 20 years (1981-2000) and the last 20 years (2001-2020) separately. To do this, estimate the GEV parameters using (i) Maximum Likelihood, and (ii) L-Moments, respectively. (Details on fitting a GEV distribution can be found in the Scipy tutorial). (Hint: The rainfall deficit can be obtained by subtracting the 30-day moving average rainfall from the mean rainfall calculated in Q1) (40 marks)
# add 'Year' column to denote the year of the days
JJA.insert(0, 'Year', JJA.index.year, allow_duplicates = False)
# calculate the 30-day moving average for each year
JJA_ma = JJA.groupby(['Year']).rolling(30, center=True).mean()
# calculate the 30-day moving average rainfall
JJA_ma[JJA_ma['Daily Rainfall Total (mm)'].notnull()]
JJA_ma.columns = ['30-day Moving Average Rainfall (mm)']
# calculate the 30-day moving average rainfall deficit
JJA_deficit = JJA['Daily Rainfall Total (mm)'].mean() - JJA_ma[JJA_ma['30-day Moving Average Rainfall (mm)'].notnull()]
JJA_deficit.columns = ['30-day Moving Average Rainfall Deficit (mm)']
display(JJA_deficit)
30-day Moving Average Rainfall Deficit (mm) | ||
---|---|---|
Year | Date | |
1981 | 1981-06-16 | 4.222455 |
1981-06-17 | 4.222455 | |
1981-06-18 | 4.222455 | |
1981-06-19 | 4.222455 | |
1981-06-20 | 4.119121 | |
... | ... | ... |
2020 | 2020-08-13 | 0.709121 |
2020-08-14 | 1.269121 | |
2020-08-15 | 1.069121 | |
2020-08-16 | 1.129121 | |
2020-08-17 | 1.129121 |
2520 rows × 1 columns
# calculate the maximum deficit for each year
JJA_annual_max = pd.DataFrame(JJA_deficit['30-day Moving Average Rainfall Deficit (mm)'].groupby('Year').max())
JJA_annual_max.columns = ['Annual Maximum Rainfall Deficit (mm)']
JJA_annual_max.head()
Annual Maximum Rainfall Deficit (mm) | |
---|---|
Year | |
1981 | 4.222455 |
1982 | 2.772455 |
1983 | 1.909121 |
1984 | 3.179121 |
1985 | 3.549121 |
# fit the distribution using MLE
cMLE_1, locMLE_1, scaleMLE_1 = gev.fit(JJA_annual_max.head(20), method = "MLE")
cMLE_2, locMLE_2, scaleMLE_2 = gev.fit(JJA_annual_max.tail(20), method = "MLE")
MLEGEV_1 = gev(cMLE_1, loc = locMLE_1, scale = scaleMLE_1)
MLEGEV_2 = gev(cMLE_2, loc = locMLE_2, scale = scaleMLE_2)
# calculate L-moments based on samples
def samlmom3(sample):
"""
samlmom3 returns the first three L-moments of samples
sample is the 1-d array
n is the total number of the samples, j is the j_th sample
"""
n = len(sample)
sample = np.sort(sample.reshape(n))[::-1]
b0 = np.mean(sample)
b1 = np.array([(n - j - 1) * sample[j] / n / (n - 1)
for j in range(n)]).sum()
b2 = np.array([(n - j - 1) * (n - j - 2) * sample[j] / n / (n - 1) / (n - 2)
for j in range(n - 1)]).sum()
lmom1 = b0
lmom2 = 2 * b1 - b0
lmom3 = 6 * (b2 - b1) + b0
return lmom1, lmom2, lmom3
def pargev_fsolve(lmom):
"""
pargev_fsolve estimates the parameters of the Generalized Extreme Value
distribution given the L-moments of samples
"""
lmom_ratios = [lmom[0], lmom[1], lmom[2] / lmom[1]]
f = lambda x, t: 2 * (1 - 3**(-x)) / (1 - 2**(-x)) - 3 - t
G = fsolve(f, 0.01, lmom_ratios[2])[0]
para3 = G
GAM = math.gamma(1 + G)
para2 = lmom_ratios[1] * G / (GAM * (1 - 2**-G))
para1 = lmom_ratios[0] - para2 * (1 - GAM) / G
return para1, para2, para3
# fit the distribution using LMM
LMM_1 = samlmom3(JJA_annual_max.head(20).values)
LMM_2 = samlmom3(JJA_annual_max.tail(20).values)
locLMM_1, scaleLMM_1, cLMM_1 = pargev_fsolve(LMM_1)
locLMM_2, scaleLMM_2, cLMM_2 = pargev_fsolve(LMM_2)
LMMGEV_1 = gev(cLMM_1, loc=locLMM_1, scale=scaleLMM_1)
LMMGEV_2 = gev(cLMM_2, loc=locLMM_2, scale=scaleLMM_2)
print('Factors fitted by Maximum Likelihood (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
.format(cMLE_1, locMLE_1, scaleMLE_1,))
print('Factors fitted by Maximum Likelihood (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
.format(cMLE_2, locMLE_2, scaleMLE_2))
print('Factors fitted by L-Moments (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
.format(cLMM_1, locLMM_1, scaleLMM_1))
print('Factors fitted by L-Moments (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
.format(cLMM_2, locLMM_2, scaleLMM_2))
Factors fitted by Maximum Likelihood (1981-2000): shape=0.740, loc=2.428, scale=1.435
Factors fitted by Maximum Likelihood (2001-2020): shape=0.380, loc=2.349, scale=0.954
Factors fitted by L-Moments (1981-2000): shape=0.611, loc=2.338, scale=1.382
Factors fitted by L-Moments (2001-2020): shape=0.442, loc=2.373, scale=0.973
Q3: Compare differences in GEV distributions#
Based on the estimated GEV parameters using L-Moments in Q2, discuss, for the driest season, whether there are statistical differences between the two distributions estimated from the first and last 20 years. (Hint: You can use the KS test to compare the underlying distributions of the two samples) (30 marks)
KS_1 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].head(20), LMMGEV_2.cdf)
KS_2 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].tail(20), LMMGEV_1.cdf)
print('From the 1st KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_1.statistic, KS_1.pvalue))
print('From the 2nd KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_2.statistic, KS_2.pvalue))
From the 1st KS test, statisic is 0.134, pvalue is 0.817
From the 2nd KS test, statisic is 0.220, pvalue is 0.250
print('The pvalues are greater than 0.05. \
These two samples are not statistically significantly different from each other. \
Hence there is no significant change of the maximum rainfall deficit distribution with time shift.')
The pvalues are greater than 0.05. These two samples are not statistically significantly different from each other. Hence there is no significant change of the maximum rainfall deficit distribution with time shift.
Q4: Estimate the return level of the extreme events#
Based on the estimated GEV parameters using L-Moments in Q2, estimate the rainfall deficit for events with return periods of 50 years, 100 years, and 200 years. (Hint: You can use the GEV distribution fitted from the data of the last 20 years.) (20 marks)
T = np.arange(2, 201)
LFTlmm = LMMGEV_2.ppf(1 - 1.0 / T)
LFTlmm = pd.DataFrame(LFTlmm, index = T,
columns = ["Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)"])
LFTlmm.index.name = "Return period"
display(LFTlmm.loc[[50, 100, 200]])
Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm) | |
---|---|
Return period | |
50 | 4.182137 |
100 | 4.286263 |
200 | 4.362494 |