{ "cells": [ { "cell_type": "markdown", "id": "5202ee2c", "metadata": {}, "source": [ "# Pandas Exercise" ] }, { "cell_type": "markdown", "id": "4cf0b058", "metadata": {}, "source": [ "Create a DataFrame by reading the file *../../assets/data/Changi_daily_rainfall.csv*. Perform the following two tasks with this DataFrame." ] }, { "cell_type": "markdown", "id": "fe90719f", "metadata": {}, "source": [ "## Task 1\n", "1. Determine whether the dataframe contains `NaN` values. If so, fill the `NaN` values with the previous day's value.\n", "2. Show the first 5 rows and the last 3 rows. Check the data length.\n", "3. For each year, calculate the frequency (i.e., number of days) of rainfall events whose magnitude exceeds 0.1 mm/day.\n", "4. Find the three months with the highest/lowest long-term averaged monthly rainfall. Are these months within your expection (e.g., Singapore's monsoon seasons)? \n", "\n", "\n", "Note: If you want to do this task on your local computer (not recommended), please modify the file path accordingly." ] }, { "cell_type": "code", "execution_count": 1, "id": "3ae47cdd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1\n", "import pandas as pd\n", "df = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col=0, parse_dates=True)\n", "df.isnull().sum().values[0] # no NaN value" ] }, { "cell_type": "code", "execution_count": 2, "id": "ae3078f1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the first 5 rows:\n", " Daily Rainfall Total (mm)\n", "Date \n", "1981-01-01 0.0\n", "1981-01-02 0.0\n", "1981-01-03 0.0\n", "1981-01-04 0.0\n", "1981-01-05 0.0 \n", "==========\n", "the tail 3 rows:\n", " Daily Rainfall Total (mm)\n", "Date \n", "2020-12-29 3.2\n", "2020-12-30 0.0\n", "2020-12-31 0.0 \n", "==========\n", "total length of data: 14610\n" ] } ], "source": [ "# 2\n", "print('the first 5 rows:\\n', df.head(5),'\\n==========')\n", "print('the tail 3 rows:\\n', df.tail(3),'\\n==========')\n", "print('total length of data:', df.shape[0])" ] }, { "cell_type": "code", "execution_count": 3, "id": "b8fc7b7b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Rainy days
Date
1981-12-31142
1982-12-31130
1983-12-31138
1984-12-31191
1985-12-31181
\n", "
" ], "text/plain": [ " Rainy days\n", "Date \n", "1981-12-31 142\n", "1982-12-31 130\n", "1983-12-31 138\n", "1984-12-31 191\n", "1985-12-31 181" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 3\n", "df_days = df[df>0.1].dropna().resample('Y').apply(lambda x:x.shape[0])\n", "df_days.columns = ['Rainy days'] \n", "df_days.head(5)" ] }, { "cell_type": "code", "execution_count": 4, "id": "a2566f86", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Monthly rainfall (mm)
12317.7400
11250.0900
1221.7825
5160.2500
3158.8750
10156.9800
4151.6450
7146.4150
9144.8475
8143.3525
6131.2050
2106.6425
\n", "
" ], "text/plain": [ " Monthly rainfall (mm)\n", "12 317.7400\n", "11 250.0900\n", "1 221.7825\n", "5 160.2500\n", "3 158.8750\n", "10 156.9800\n", "4 151.6450\n", "7 146.4150\n", "9 144.8475\n", "8 143.3525\n", "6 131.2050\n", "2 106.6425" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 4\n", "df_monthly = df.resample('M').sum()\n", "df_monthly.index = [ts.month for ts in df_monthly.index]\n", "df_month_mean = df_monthly.groupby(level=0).mean()\n", "df_month_mean.columns = ['Monthly rainfall (mm)']\n", "df_month_mean.sort_values(by=['Monthly rainfall (mm)'],ascending=False) \n", "# Nov, Dec and Jan have the highest average monthly rainfall.\n", "# Feb, Jun, Aug have lowest average monthly rainfall." ] }, { "cell_type": "markdown", "id": "69ba12e4", "metadata": {}, "source": [ "## Task 2\n", "1. Resample the DataFrame from daily time scale to annual time scale. Calculate the annual total rainfall for each year.\n", "2. Compare the long-term averaged annual rainfall in two periods: 1981-2000 and 2001-2020. \n", "3. Find the year with the largest/smallest annual rainfall and print their values.\n", "4. Calculate the trend of annual rainfall during 1981-2020 using the following formula:" ] }, { "cell_type": "markdown", "id": "71235b80", "metadata": {}, "source": [ "$$a = \\frac{n\\sum{(xy)}-\\sum{x}\\sum{y}}{n\\sum{x^2}-(\\sum{x})^2}$$ " ] }, { "cell_type": "markdown", "id": "c531d004", "metadata": {}, "source": [ "where $a$ is the trend of annual rainfall, $x$ is year, $y$ is annual rainfall, and $n$ is the number of years." ] }, { "cell_type": "code", "execution_count": 5, "id": "2b40dc9f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Yearly Rainfall Total (mm)
19811336.3
19821581.7
\n", "
" ], "text/plain": [ " Yearly Rainfall Total (mm)\n", "1981 1336.3\n", "1982 1581.7" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1\n", "df_yearly = df.resample('Y').sum()\n", "df_yearly.columns = ['Yearly Rainfall Total (mm)']\n", "df_yearly.index = [ts.year for ts in df_yearly.index]\n", "df_yearly.head(2)" ] }, { "cell_type": "code", "execution_count": 6, "id": "af867f28", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Averaged annual rainfall during 1981-2000: 2071.55 mm\n", " Averaged annual rainfall during 1981-2000: 2108.10 mm\n", " Averaged annual rainfall increase during\n", " 2001-2020 relative to 1981-2000: 36.54 mm\n" ] } ], "source": [ "# 2\n", "rainfall_1981_2000 = df_yearly.loc[1981:2000].mean()\n", "rainfall_2001_2020 = df_yearly.loc[2001:].mean()\n", "\n", "print('Averaged annual rainfall during 1981-2000: %.2f mm\\n'%rainfall_1981_2000.values[0], \n", " 'Averaged annual rainfall during 1981-2000: %.2f mm\\n'%rainfall_2001_2020.values[0], \n", " 'Averaged annual rainfall increase during\\n 2001-2020 relative to 1981-2000: %.2f mm'%\n", " (rainfall_2001_2020-rainfall_1981_2000).values[0])" ] }, { "cell_type": "code", "execution_count": 7, "id": "43ac54d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The year with smallest annual rainfall: 1997\n", "The year with largest annual rainfall: 2007\n" ] } ], "source": [ "# 3\n", "max_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.max().values[0]]\n", "min_rainfall_year = df_yearly[df_yearly['Yearly Rainfall Total (mm)']==df_yearly.min().values[0]]\n", "print('The year with smallest annual rainfall:', min_rainfall_year.index[0])\n", "print('The year with largest annual rainfall:', max_rainfall_year.index[0])" ] }, { "cell_type": "code", "execution_count": 8, "id": "d0ed2ade", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Trend: -1.37 mm/year\n" ] } ], "source": [ "# 4\n", "x = df_yearly.index.values\n", "y = df_yearly['Yearly Rainfall Total (mm)'].values\n", "n = len(x)\n", "a = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x**2)-sum(x)**2)\n", "print('Trend: %.2f mm/year'%a)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }