3.3. Pandas Exercise#
Create a DataFrame by reading the file ../../assets/data/Changi_daily_rainfall.csv. Perform the following two tasks with this DataFrame.
3.3.1. Task 1#
Determine whether the dataframe contains
NaN
values. If so, fill theNaN
values with the previous day’s value.Show the first 5 rows and the last 3 rows. Check the data length.
For each year, calculate the frequency (i.e., number of days) of rainfall events whose magnitude exceeds 0.1 mm/day.
Find the three months with the highest/lowest long-term averaged monthly rainfall. Are these months within your expection (e.g., Singapore’s monsoon seasons)?
Note: If you want to do this task on your local computer (not recommended), please modify the file path accordingly.
# 1
import pandas as pd
df = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col=0, parse_dates=True)
df.isnull().sum().values[0] # no NaN value
0
# 2
print('the first 5 rows:\n', df.head(5),'\n==========')
print('the tail 3 rows:\n', df.tail(3),'\n==========')
print('total length of data:', df.shape[0])
the first 5 rows:
Daily Rainfall Total (mm)
Date
1981-01-01 0.0
1981-01-02 0.0
1981-01-03 0.0
1981-01-04 0.0
1981-01-05 0.0
==========
the tail 3 rows:
Daily Rainfall Total (mm)
Date
2020-12-29 3.2
2020-12-30 0.0
2020-12-31 0.0
==========
total length of data: 14610
# 3
df_days = df[df>0.1].dropna().resample('Y').apply(lambda x:x.shape[0])
df_days.columns = ['Rainy days']
df_days.head(5)
Rainy days | |
---|---|
Date | |
1981-12-31 | 142 |
1982-12-31 | 130 |
1983-12-31 | 138 |
1984-12-31 | 191 |
1985-12-31 | 181 |
# 4
df_monthly = df.resample('M').sum()
df_monthly.index = [ts.month for ts in df_monthly.index]
df_month_mean = df_monthly.groupby(level=0).mean()
df_month_mean.columns = ['Monthly rainfall (mm)']
df_month_mean.sort_values(by=['Monthly rainfall (mm)'],ascending=False)
# Nov, Dec and Jan have the highest average monthly rainfall.
# Feb, Jun, Aug have lowest average monthly rainfall.
Monthly rainfall (mm) | |
---|---|
12 | 317.7400 |
11 | 250.0900 |
1 | 221.7825 |
5 | 160.2500 |
3 | 158.8750 |
10 | 156.9800 |
4 | 151.6450 |
7 | 146.4150 |
9 | 144.8475 |
8 | 143.3525 |
6 | 131.2050 |
2 | 106.6425 |
3.3.2. Task 2#
Resample the DataFrame from daily time scale to annual time scale. Calculate the annual total rainfall for each year.
Compare the long-term averaged annual rainfall in two periods: 1981-2000 and 2001-2020.
Find the year with the largest/smallest annual rainfall and print their values.
Calculate the trend of annual rainfall during 1981-2020 using the following formula:
where \(a\) is the trend of annual rainfall, \(x\) is year, \(y\) is annual rainfall, and \(n\) is the number of years.
# 1
df_yearly = df.resample('Y').sum()
df_yearly.columns = ['Yearly Rainfall Total (mm)']
df_yearly.index = [ts.year for ts in df_yearly.index]
df_yearly.head(2)
Yearly Rainfall Total (mm) | |
---|---|
1981 | 1336.3 |
1982 | 1581.7 |
# 2
rainfall_1981_2000 = df_yearly.loc[1981:2000].mean()
rainfall_2001_2020 = df_yearly.loc[2001:].mean()
print('Averaged annual rainfall during 1981-2000: %.2f mm\n'%rainfall_1981_2000.values[0],
'Averaged annual rainfall during 1981-2000: %.2f mm\n'%rainfall_2001_2020.values[0],
'Averaged annual rainfall increase during\n 2001-2020 relative to 1981-2000: %.2f mm'%
(rainfall_2001_2020-rainfall_1981_2000).values[0])
Averaged annual rainfall during 1981-2000: 2071.55 mm
Averaged annual rainfall during 1981-2000: 2108.10 mm
Averaged annual rainfall increase during
2001-2020 relative to 1981-2000: 36.54 mm
# 3
max_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.max().values[0]]
min_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.min().values[0]]
print('The year with smallest annual rainfall:', min_rainfall_year.index[0])
print('The year with largest annual rainfall:', max_rainfall_year.index[0])
The year with smallest annual rainfall: 1997
The year with largest annual rainfall: 2007
# 4
x = df_yearly.index.values
y = df_yearly['Yearly Rainfall Total (mm)'].values
n = len(x)
a = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x**2)-sum(x)**2)
print('Trend: %.2f mm/year'%a)
Trend: -1.37 mm/year