HW#1: Extreme Rainfall Deficit in Singapore#

Objectives

This homework will help you gain a better understanding in terms of the ways how to:

  • Fit Generalized Extreme Value (GEV) distribution

  • Estimate the return level of extreme rainfall deficit

Happy coding!

Submission Guide

Deadline: Sunday 11:59 pm, 5th November 2023 (Note: Late submissions will not be accepted).

Please upload your solutions to Canvas in a Jupyter Notebook format with the name “Homework1_StudentID.ipynb”. Make sure to write down your student ID and full name in the cell below.

For any questions, feel free to contact Prof. Xiaogang HE (hexg@nus.edu.sg), or Zhixiao NIU (niu.zhixiao@u.nus.edu).

## Fill your student ID and full name below.

# Student ID:
# Full name:

Data: You will need to use the historical (1981-2020) daily total rainfall at Singapore’s Changi station for this homework.

You can create a DataFrame using Pandas by reading file “../../assets/data/Changi_daily_rainfall.csv”.

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy import stats
from scipy.optimize import fsolve
from scipy.stats import genextreme as gev

Q1: Calculate extreme rainfall statisitics#

First, divide the data into the following seasons: DJF (Dec-Jan-Feb), MAM (Mar-Apr-May), JJA (Jun-Jul-Aug), and SON (Sep-Oct-Nov) using Pandas’ filtering methods. Calculate the following statistics of daily rainfall for each season: (i) mean; (ii) variance; (iii) skewness; and (iv) kurtosis. Based on the results, identify the driest season. (Details on the filtering method can be found in the Pandas tutorial). (Hint: The driest season can refer to the season with the lowest mean of daily rainfall) (10 marks)

rain = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col = 0, header = 0, parse_dates = True)
display(rain.head(10))
Daily Rainfall Total (mm)
Date
1981-01-01 0.0
1981-01-02 0.0
1981-01-03 0.0
1981-01-04 0.0
1981-01-05 0.0
1981-01-06 0.3
1981-01-07 51.4
1981-01-08 0.0
1981-01-09 4.4
1981-01-10 0.0
# divide the data into 4 seasons
MAM = rain.loc[(rain.index.month > 2) & (rain.index.month < 6)]
JJA = rain.loc[(rain.index.month > 5) & (rain.index.month < 9)]
SON = rain.loc[(rain.index.month > 8) & (rain.index.month < 12)]
DJF = rain.loc[(rain.index.month < 3) | (rain.index.month > 11)]


stat = pd.DataFrame([[DJF.mean().values[0],DJF.std().values[0]**2,DJF.skew().values[0],DJF.kurtosis().values[0]],
                     [MAM.mean().values[0],MAM.std().values[0]**2,MAM.skew().values[0],MAM.kurtosis().values[0]],
                     [JJA.mean().values[0],JJA.std().values[0]**2,JJA.skew().values[0],JJA.kurtosis().values[0]],
                     [SON.mean().values[0],SON.std().values[0]**2,SON.skew().values[0],SON.kurtosis().values[0]]
                    ], 
                    index = ['DJF', 'MAM', 'JJA', 'SON'], 
                    columns = ['Mean', 'Variance', 'Skewness', 'Kurtosis']
                   )

display(stat)
print('The driest season is JJA (Jun-Jul-Aug).')
Mean Variance Skewness Kurtosis
DJF 7.159723 336.331229 4.907595 33.528282
MAM 5.117065 141.739578 3.980410 22.942881
JJA 4.575788 129.392105 5.052307 41.294758
SON 6.065027 197.342220 4.930091 38.885941
The driest season is JJA (Jun-Jul-Aug).

Q2: Fit the GEV distribution#

For the driest season you identified in Q1, find the seasonal maximum rainfall deficit based on the 30-day moving average rainfall deficit (please use centered moving average). This will result in a data set of 40 values, one value for each year. Fit two GEV distributions of seasonal maximum rainfall deficit using data from the first 20 years (1981-2000) and the last 20 years (2001-2020) separately. To do this, estimate the GEV parameters using (i) Maximum Likelihood, and (ii) L-Moments, respectively. (Details on fitting a GEV distribution can be found in the Scipy tutorial). (Hint: The rainfall deficit can be obtained by subtracting the 30-day moving average rainfall from the mean rainfall calculated in Q1) (40 marks)

# add 'Year' column to denote the year of the days
JJA.insert(0, 'Year', JJA.index.year, allow_duplicates = False)

# calculate the 30-day moving average for each year
JJA_ma = JJA.groupby(['Year']).rolling(30, center=True).mean()

# calculate the 30-day moving average rainfall
JJA_ma[JJA_ma['Daily Rainfall Total (mm)'].notnull()]
JJA_ma.columns = ['30-day Moving Average Rainfall (mm)']

# calculate the 30-day moving average rainfall deficit
JJA_deficit = JJA['Daily Rainfall Total (mm)'].mean() - JJA_ma[JJA_ma['30-day Moving Average Rainfall (mm)'].notnull()]
JJA_deficit.columns = ['30-day Moving Average Rainfall Deficit (mm)']
display(JJA_deficit)
30-day Moving Average Rainfall Deficit (mm)
Year Date
1981 1981-06-16 4.222455
1981-06-17 4.222455
1981-06-18 4.222455
1981-06-19 4.222455
1981-06-20 4.119121
... ... ...
2020 2020-08-13 0.709121
2020-08-14 1.269121
2020-08-15 1.069121
2020-08-16 1.129121
2020-08-17 1.129121

2520 rows × 1 columns

# calculate the maximum deficit for each year
JJA_annual_max = pd.DataFrame(JJA_deficit['30-day Moving Average Rainfall Deficit (mm)'].groupby('Year').max())
JJA_annual_max.columns = ['Annual Maximum Rainfall Deficit (mm)']
JJA_annual_max.head()
Annual Maximum Rainfall Deficit (mm)
Year
1981 4.222455
1982 2.772455
1983 1.909121
1984 3.179121
1985 3.549121
# fit the distribution using MLE
cMLE_1, locMLE_1, scaleMLE_1 = gev.fit(JJA_annual_max.head(20), method = "MLE")
cMLE_2, locMLE_2, scaleMLE_2 = gev.fit(JJA_annual_max.tail(20), method = "MLE")
MLEGEV_1 = gev(cMLE_1, loc = locMLE_1, scale = scaleMLE_1) 
MLEGEV_2 = gev(cMLE_2, loc = locMLE_2, scale = scaleMLE_2) 
# calculate L-moments based on samples
def samlmom3(sample):
    """
    samlmom3 returns the first three L-moments of samples
    sample is the 1-d array
    n is the total number of the samples, j is the j_th sample
    """
    n = len(sample)
    sample = np.sort(sample.reshape(n))[::-1]
    b0 = np.mean(sample)
    b1 = np.array([(n - j - 1) * sample[j] / n / (n - 1)
                   for j in range(n)]).sum()
    b2 = np.array([(n - j - 1) * (n - j - 2) * sample[j] / n / (n - 1) / (n - 2)
                   for j in range(n - 1)]).sum()
    lmom1 = b0
    lmom2 = 2 * b1 - b0
    lmom3 = 6 * (b2 - b1) + b0

    return lmom1, lmom2, lmom3

def pargev_fsolve(lmom):
    """
    pargev_fsolve estimates the parameters of the Generalized Extreme Value 
    distribution given the L-moments of samples
    """
    lmom_ratios = [lmom[0], lmom[1], lmom[2] / lmom[1]]
    f = lambda x, t: 2 * (1 - 3**(-x)) / (1 - 2**(-x)) - 3 - t
    G = fsolve(f, 0.01, lmom_ratios[2])[0]
    para3 = G
    GAM = math.gamma(1 + G)
    para2 = lmom_ratios[1] * G / (GAM * (1 - 2**-G))
    para1 = lmom_ratios[0] - para2 * (1 - GAM) / G
    return para1, para2, para3
# fit the distribution using LMM
LMM_1 = samlmom3(JJA_annual_max.head(20).values)
LMM_2 = samlmom3(JJA_annual_max.tail(20).values)

locLMM_1, scaleLMM_1, cLMM_1 = pargev_fsolve(LMM_1)
locLMM_2, scaleLMM_2, cLMM_2 = pargev_fsolve(LMM_2)

LMMGEV_1 = gev(cLMM_1, loc=locLMM_1, scale=scaleLMM_1) 
LMMGEV_2 = gev(cLMM_2, loc=locLMM_2, scale=scaleLMM_2) 
print('Factors fitted by Maximum Likelihood (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
      .format(cMLE_1, locMLE_1, scaleMLE_1,))
print('Factors fitted by Maximum Likelihood (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
      .format(cMLE_2, locMLE_2, scaleMLE_2))
print('Factors fitted by L-Moments (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
      .format(cLMM_1, locLMM_1, scaleLMM_1))
print('Factors fitted by L-Moments (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\n'
      .format(cLMM_2, locLMM_2, scaleLMM_2))
Factors fitted by Maximum Likelihood (1981-2000): shape=0.740, loc=2.428, scale=1.435

Factors fitted by Maximum Likelihood (2001-2020): shape=0.380, loc=2.349, scale=0.954

Factors fitted by L-Moments (1981-2000): shape=0.611, loc=2.338, scale=1.382

Factors fitted by L-Moments (2001-2020): shape=0.442, loc=2.373, scale=0.973

Q3: Compare differences in GEV distributions#

Based on the estimated GEV parameters using L-Moments in Q2, discuss, for the driest season, whether there are statistical differences between the two distributions estimated from the first and last 20 years. (Hint: You can use the KS test to compare the underlying distributions of the two samples) (30 marks)

KS_1 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].head(20), LMMGEV_2.cdf)
KS_2 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].tail(20), LMMGEV_1.cdf)
print('From the 1st KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_1.statistic, KS_1.pvalue))
print('From the 2nd KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_2.statistic, KS_2.pvalue))
From the 1st KS test, statisic is 0.134, pvalue is 0.817
From the 2nd KS test, statisic is 0.220, pvalue is 0.250
print('The pvalues are greater than 0.05. \
       These two samples are not statistically significantly different from each other. \
       Hence there is no significant change of the maximum rainfall deficit distribution with time shift.')
The pvalues are greater than 0.05.        These two samples are not statistically significantly different from each other.        Hence there is no significant change of the maximum rainfall deficit distribution with time shift.

Q4: Estimate the return level of the extreme events#

Based on the estimated GEV parameters using L-Moments in Q2, estimate the rainfall deficit for events with return periods of 50 years, 100 years, and 200 years. (Hint: You can use the GEV distribution fitted from the data of the last 20 years.) (20 marks)

T = np.arange(2, 201)
LFTlmm = LMMGEV_2.ppf(1 - 1.0 / T)
LFTlmm = pd.DataFrame(LFTlmm, index = T, 
                      columns = ["Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)"])
LFTlmm.index.name = "Return period"

display(LFTlmm.loc[[50, 100, 200]])
Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)
Return period
50 4.182137
100 4.286263
200 4.362494