3.3. Pandas Exercise#

Create a DataFrame by reading the file ../../assets/data/Changi_daily_rainfall.csv. Perform the following two tasks with this DataFrame.

3.3.1. Task 1#

  1. Determine whether the dataframe contains NaN values. If so, fill the NaN values with the previous day’s value.

  2. Show the first 5 rows and the last 3 rows. Check the data length.

  3. For each year, calculate the frequency (i.e., number of days) of rainfall events whose magnitude exceeds 0.1 mm/day.

  4. Find the three months with the highest/lowest long-term averaged monthly rainfall. Are these months within your expection (e.g., Singapore’s monsoon seasons)?

Note: If you want to do this task on your local computer (not recommended), please modify the file path accordingly.

# 1
import pandas as pd
df = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col=0, parse_dates=True)
df.isnull().sum().values[0] # no NaN value
0
# 2
print('the first 5 rows:\n', df.head(5),'\n==========')
print('the tail 3 rows:\n', df.tail(3),'\n==========')
print('total length of data:', df.shape[0])
the first 5 rows:
             Daily Rainfall Total (mm)
Date                                 
1981-01-01                        0.0
1981-01-02                        0.0
1981-01-03                        0.0
1981-01-04                        0.0
1981-01-05                        0.0 
==========
the tail 3 rows:
             Daily Rainfall Total (mm)
Date                                 
2020-12-29                        3.2
2020-12-30                        0.0
2020-12-31                        0.0 
==========
total length of data: 14610
# 3
df_days = df[df>0.1].dropna().resample('Y').apply(lambda x:x.shape[0])
df_days.columns = ['Rainy days'] 
df_days.head(5)
Rainy days
Date
1981-12-31 142
1982-12-31 130
1983-12-31 138
1984-12-31 191
1985-12-31 181
# 4
df_monthly = df.resample('M').sum()
df_monthly.index = [ts.month for ts in df_monthly.index]
df_month_mean = df_monthly.groupby(level=0).mean()
df_month_mean.columns = ['Monthly rainfall (mm)']
df_month_mean.sort_values(by=['Monthly rainfall (mm)'],ascending=False) 
# Nov, Dec and Jan have the highest average monthly rainfall.
# Feb, Jun, Aug have lowest average monthly rainfall.
Monthly rainfall (mm)
12 317.7400
11 250.0900
1 221.7825
5 160.2500
3 158.8750
10 156.9800
4 151.6450
7 146.4150
9 144.8475
8 143.3525
6 131.2050
2 106.6425

3.3.2. Task 2#

  1. Resample the DataFrame from daily time scale to annual time scale. Calculate the annual total rainfall for each year.

  2. Compare the long-term averaged annual rainfall in two periods: 1981-2000 and 2001-2020.

  3. Find the year with the largest/smallest annual rainfall and print their values.

  4. Calculate the trend of annual rainfall during 1981-2020 using the following formula:

\[a = \frac{n\sum{(xy)}-\sum{x}\sum{y}}{n\sum{x^2}-(\sum{x})^2}\]

where \(a\) is the trend of annual rainfall, \(x\) is year, \(y\) is annual rainfall, and \(n\) is the number of years.

# 1
df_yearly = df.resample('Y').sum()
df_yearly.columns = ['Yearly Rainfall Total (mm)']
df_yearly.index = [ts.year for ts in df_yearly.index]
df_yearly.head(2)
Yearly Rainfall Total (mm)
1981 1336.3
1982 1581.7
# 2
rainfall_1981_2000 = df_yearly.loc[1981:2000].mean()
rainfall_2001_2020 = df_yearly.loc[2001:].mean()

print('Averaged annual rainfall during 1981-2000: %.2f mm\n'%rainfall_1981_2000.values[0], 
      'Averaged annual rainfall during 1981-2000: %.2f mm\n'%rainfall_2001_2020.values[0], 
      'Averaged annual rainfall increase during\n 2001-2020 relative to 1981-2000: %.2f mm'%
      (rainfall_2001_2020-rainfall_1981_2000).values[0])
Averaged annual rainfall during 1981-2000: 2071.55 mm
 Averaged annual rainfall during 1981-2000: 2108.10 mm
 Averaged annual rainfall increase during
 2001-2020 relative to 1981-2000: 36.54 mm
# 3
max_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.max().values[0]]
min_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.min().values[0]]
print('The year with smallest annual rainfall:', min_rainfall_year.index[0])
print('The year with largest annual rainfall:', max_rainfall_year.index[0])
The year with smallest annual rainfall: 1997
The year with largest annual rainfall: 2007
# 4
x = df_yearly.index.values
y = df_yearly['Yearly Rainfall Total (mm)'].values
n = len(x)
a = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x**2)-sum(x)**2)
print('Trend: %.2f mm/year'%a)
Trend: -1.37 mm/year