5.1. Xarray tutorial#
Xarray
is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!
5.1.1. Install xarray#
If you run code on your own computer, you need to install xarray
. Open the console and enter conda install xarray
or execute !pip install xarray
in this jupyter notebook.
try:
import xarray as xr # check whether xarray installed
except ModuleNotFoundError:
!pip install xarray==0.19.0 # install xarray
import xarray as xr
print(xr.__version__)
0.19.0
import pandas as pd
import numpy as np
5.1.2. Introduction to xarray Data Structures#
Like pandas
, xarray
has two core data structures:
DataArray
Dataset
The DataArray
is designed to hold a single multi-dimensional variable and its coordinates, and the Dataset
is designed to hold multiple variables that potentially share the same coordinates.
5.1.3. DataArray#
DataArray has four important attributes:
values
is aNumpy
array and stores the data.dims
is atuple
that includes the names of all dimensions ofvalues
and its length is the number of dimensions.coords
is a dict-like built-in data structure of xarray and includes coordinate list or 1D-array of each dimension.attrs
is adict
and you can useattrs
to define some custom attributes.
5.1.3.1. Defining a DataArray#
You can use the DataArray()
function to create a DataArray. There are 4 arguments corresponding to the four attributes above. The data
argument is corresponding to values
attribute.
da = xr.DataArray(data=[[1, 2, 3], [2, 3, 4]],
dims=['x', 'y'],
coords={'x': [10, 20],
'y': [10, 20, 30]},
attrs={'summary': 'This is a custom DataArray for tutorial',
'license': 'CC BY-NC-ND 4.0'})
da
<xarray.DataArray (x: 2, y: 3)> array([[1, 2, 3], [2, 3, 4]]) Coordinates: * x (x) int32 10 20 * y (y) int32 10 20 30 Attributes: summary: This is a custom DataArray for tutorial license: CC BY-NC-ND 4.0
- x: 2
- y: 3
- 1 2 3 2 3 4
array([[1, 2, 3], [2, 3, 4]])
- x(x)int3210 20
array([10, 20])
- y(y)int3210 20 30
array([10, 20, 30])
- summary :
- This is a custom DataArray for tutorial
- license :
- CC BY-NC-ND 4.0
da.values
array([[1, 2, 3],
[2, 3, 4]])
da.dims
('x', 'y')
da.coords
Coordinates:
* x (x) int32 10 20
* y (y) int32 10 20 30
da.attrs
{'summary': 'This is a custom DataArray for tutorial',
'license': 'CC BY-NC-ND 4.0'}
You can specify the data
argument to create a DataArray
. By default, dims
of DataArray
is ('dim_0', 'dim_1', ...)
and attrs
of DataArray
is a empty dict
.
da1 = xr.DataArray(data=[[1, 2, 3],
[2, 3, 4]])
da1
<xarray.DataArray (dim_0: 2, dim_1: 3)> array([[1, 2, 3], [2, 3, 4]]) Dimensions without coordinates: dim_0, dim_1
- dim_0: 2
- dim_1: 3
- 1 2 3 2 3 4
array([[1, 2, 3], [2, 3, 4]])
da1.dims
('dim_0', 'dim_1')
da1.attrs
{}
da1.coords
Coordinates:
*empty*
5.1.3.2. Change attribute#
da1 = da.copy(deep=True)
# change values
da1.values = np.random.rand(2, 3)
# change dims (input dict)
da1 = da1.rename({'x': 'm', 'y': 'n'})
# change name (input str)
da1 = da1.rename('Spatial Rainfall')
# add attrs
da1.attrs['gen'] = 'generated by numpy randomly'
# delete attrs
del da1.attrs['summary']
da1
<xarray.DataArray 'Spatial Rainfall' (m: 2, n: 3)> array([[0.47856475, 0.79661643, 0.38884127], [0.44622596, 0.17973351, 0.15285284]]) Coordinates: * m (m) int32 10 20 * n (n) int32 10 20 30 Attributes: license: CC BY-NC-ND 4.0 gen: generated by numpy randomly
- m: 2
- n: 3
- 0.4786 0.7966 0.3888 0.4462 0.1797 0.1529
array([[0.47856475, 0.79661643, 0.38884127], [0.44622596, 0.17973351, 0.15285284]])
- m(m)int3210 20
array([10, 20])
- n(n)int3210 20 30
array([10, 20, 30])
- license :
- CC BY-NC-ND 4.0
- gen :
- generated by numpy randomly
5.1.4. Dataset#
Dataset has three important attributes: data_vars
, coords
and attrs
.
data_vars
is adict
with each key being the name of the variable or the dimension and each value being one of:
1D-array
orlist
Series
ofpandas
DataFrame
ofpandas
tuple
with two elementsDataArray
1D-array
or list
: When value of the dict
is a 1D-array or list, the key of the dict
will become the name of the dimension and corresponding 1D-array or list will become coordinates of the dimension.
import pandas as pd
ds = xr.Dataset(data_vars={'v1': [1, 2, 3], # list
'v2': np.array([1, 2, 3])}) #1D-array
ds
<xarray.Dataset> Dimensions: (v1: 3, v2: 3) Coordinates: * v1 (v1) int32 1 2 3 * v2 (v2) int32 1 2 3 Data variables: *empty*
- v1: 3
- v2: 3
- v1(v1)int321 2 3
array([1, 2, 3])
- v2(v2)int321 2 3
array([1, 2, 3])
Series
of pandas
: When value of the dict
is a Series
of pandas
, the key of the dict
will become the name of the variable and values of the Series
will become values of the variable. The index
of the Series
will become values of coordinates of the variable.
ds = xr.Dataset(data_vars={'v1': pd.Series([1, 2, 3], index=['a', 'b', 'c'])})
ds
<xarray.Dataset> Dimensions: (dim_0: 3) Coordinates: * dim_0 (dim_0) object 'a' 'b' 'c' Data variables: v1 (dim_0) int64 1 2 3
- dim_0: 3
- dim_0(dim_0)object'a' 'b' 'c'
array(['a', 'b', 'c'], dtype=object)
- v1(dim_0)int641 2 3
array([1, 2, 3], dtype=int64)
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s.index.name = 'str'
ds = xr.Dataset(data_vars={'v1': s})
ds
<xarray.Dataset> Dimensions: (str: 3) Coordinates: * str (str) object 'a' 'b' 'c' Data variables: v1 (str) int64 1 2 3
- str: 3
- str(str)object'a' 'b' 'c'
array(['a', 'b', 'c'], dtype=object)
- v1(str)int641 2 3
array([1, 2, 3], dtype=int64)
DataFrame
of pandas
: When value of the dict
is a DataFrame
, the key of the dict
will become the name of the variable and values of the DataFrame
will become values of the variable. The index
and columns
of the DataFrame
will become values of coordinates of the variable. Therefore, the variable constructed by DataFrame
has two dimensions.
df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'])
df.index.name = 'str'
ds = xr.Dataset(data_vars={'v1': df})
ds
<xarray.Dataset> Dimensions: (str: 3, dim_1: 1) Coordinates: * str (str) object 'a' 'b' 'c' * dim_1 (dim_1) int64 0 Data variables: v1 (str, dim_1) int64 1 2 3
- str: 3
- dim_1: 1
- str(str)object'a' 'b' 'c'
array(['a', 'b', 'c'], dtype=object)
- dim_1(dim_1)int640
array([0], dtype=int64)
- v1(str, dim_1)int641 2 3
array([[1], [2], [3]], dtype=int64)
df = pd.DataFrame([[1, 2],
[2, 5],
[3, 6]], index=['a', 'b', 'c'], columns=['c1', 'c2'])
df.index.name = 'str'
ds = xr.Dataset(data_vars={'v1': df})
ds
<xarray.Dataset> Dimensions: (str: 3, dim_1: 2) Coordinates: * str (str) object 'a' 'b' 'c' * dim_1 (dim_1) object 'c1' 'c2' Data variables: v1 (str, dim_1) int64 1 2 2 5 3 6
- str: 3
- dim_1: 2
- str(str)object'a' 'b' 'c'
array(['a', 'b', 'c'], dtype=object)
- dim_1(dim_1)object'c1' 'c2'
array(['c1', 'c2'], dtype=object)
- v1(str, dim_1)int641 2 2 5 3 6
array([[1, 2], [2, 5], [3, 6]], dtype=int64)
tuple
with two elements: When value of the dict
is a tuple
, the key of the dict
will become the name of the variable. The first elements of the tuple
will become dimension names of the variable and the second elements of the tuple
will become values of the variable.
tp = (('d1', 'd2'), [[1, 2],
[2, 5],
[3, 6]])
ds = xr.Dataset(data_vars={'v1': tp})
ds
<xarray.Dataset> Dimensions: (d1: 3, d2: 2) Dimensions without coordinates: d1, d2 Data variables: v1 (d1, d2) int32 1 2 2 5 3 6
- d1: 3
- d2: 2
- v1(d1, d2)int321 2 2 5 3 6
array([[1, 2], [2, 5], [3, 6]])
DataArray
: When value of the dict
is a DataArray
, the key of the dict
will become the name of the variable. The dimensions and coordinates of the DataArray
will be added to the Dataset
created.
ds = xr.Dataset(data_vars={'v1': da})
ds
<xarray.Dataset> Dimensions: (x: 2, y: 3) Coordinates: * x (x) int32 10 20 * y (y) int32 10 20 30 Data variables: v1 (x, y) int32 1 2 3 2 3 4
- x: 2
- y: 3
- x(x)int3210 20
array([10, 20])
- y(y)int3210 20 30
array([10, 20, 30])
- v1(x, y)int321 2 3 2 3 4
- summary :
- This is a custom DataArray for tutorial
- license :
- CC BY-NC-ND 4.0
array([[1, 2, 3], [2, 3, 4]])
coords
is also adict
. The key of thedict
will be the name of the dimension and the values of thedict
will be the coordinate of the dimension.
ds = xr.Dataset(data_vars={'v1': tp},
coords={'d1': ['a', 'b', 'c'], 'd2': ['c1', 'c2']})
ds
<xarray.Dataset> Dimensions: (d1: 3, d2: 2) Coordinates: * d1 (d1) <U1 'a' 'b' 'c' * d2 (d2) <U2 'c1' 'c2' Data variables: v1 (d1, d2) int32 1 2 2 5 3 6
- d1: 3
- d2: 2
- d1(d1)<U1'a' 'b' 'c'
array(['a', 'b', 'c'], dtype='<U1')
- d2(d2)<U2'c1' 'c2'
array(['c1', 'c2'], dtype='<U2')
- v1(d1, d2)int321 2 2 5 3 6
array([[1, 2], [2, 5], [3, 6]])
attrs
is also adict
that is the same as theattrs
ofDataArray
.
5.1.5. Operations and Mathematical Functions#
Data variables can be modified through Mathematical Functions which is the same as Numpy
arrays.
da = xr.DataArray(data=[[1, 2, 3],
[2, 3, 4]],
dims=['x', 'y'],
coords={'x': [10, 20],
'y': [10, 20, 30]})
print(da * 10)
print(np.log(da))
<xarray.DataArray (x: 2, y: 3)>
array([[10, 20, 30],
[20, 30, 40]])
Coordinates:
* x (x) int32 10 20
* y (y) int32 10 20 30
<xarray.DataArray (x: 2, y: 3)>
array([[0. , 0.69314718, 1.09861229],
[0.69314718, 1.09861229, 1.38629436]])
Coordinates:
* x (x) int32 10 20
* y (y) int32 10 20 30
ds = xr.Dataset(data_vars={'v1': da})
print(ds * 10)
print(np.log(ds))
<xarray.Dataset>
Dimensions: (x: 2, y: 3)
Coordinates:
* x (x) int32 10 20
* y (y) int32 10 20 30
Data variables:
v1 (x, y) int32 10 20 30 20 30 40
<xarray.Dataset>
Dimensions: (x: 2, y: 3)
Coordinates:
* x (x) int32 10 20
* y (y) int32 10 20 30
Data variables:
v1 (x, y) float64 0.0 0.6931 1.099 0.6931 1.099 1.386
5.1.6. Loading Data from netCDF Files#
NetCDF (Network Common Data Format) is a widely used format for distributing geoscience data. For more details about netCDF please access netCDF website.
Call open_dataset
function to open a netCDF file. For more details about reading and writing netCDF files please access xarray netCDF docs. Below we load some ERA5 dataset which we have downloaded from online websites.
# data source: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form
ds = xr.open_dataset('../../assets/data/era5_singapore_2021.9.1.nc')
ds
<xarray.Dataset> Dimensions: (longitude: 3, latitude: 3, time: 24) Coordinates: * longitude (longitude) float32 103.5 103.8 104.0 * latitude (latitude) float32 1.5 1.25 1.0 * time (time) datetime64[ns] 2021-09-01 ... 2021-09-01T23:00:00 Data variables: t2m (time, latitude, longitude) float32 298.8 298.6 ... 299.7 300.0 tp (time, latitude, longitude) float32 0.0004115 ... 0.0002184 Attributes: Conventions: CF-1.6 history: 2021-10-01 12:00:45 GMT by grib_to_netcdf-2.20.0: /opt/ecmw...
- longitude: 3
- latitude: 3
- time: 24
- longitude(longitude)float32103.5 103.8 104.0
- units :
- degrees_east
- long_name :
- longitude
array([103.5 , 103.75, 104. ], dtype=float32)
- latitude(latitude)float321.5 1.25 1.0
- units :
- degrees_north
- long_name :
- latitude
array([1.5 , 1.25, 1. ], dtype=float32)
- time(time)datetime64[ns]2021-09-01 ... 2021-09-01T23:00:00
- long_name :
- time
array(['2021-09-01T00:00:00.000000000', '2021-09-01T01:00:00.000000000', '2021-09-01T02:00:00.000000000', '2021-09-01T03:00:00.000000000', '2021-09-01T04:00:00.000000000', '2021-09-01T05:00:00.000000000', '2021-09-01T06:00:00.000000000', '2021-09-01T07:00:00.000000000', '2021-09-01T08:00:00.000000000', '2021-09-01T09:00:00.000000000', '2021-09-01T10:00:00.000000000', '2021-09-01T11:00:00.000000000', '2021-09-01T12:00:00.000000000', '2021-09-01T13:00:00.000000000', '2021-09-01T14:00:00.000000000', '2021-09-01T15:00:00.000000000', '2021-09-01T16:00:00.000000000', '2021-09-01T17:00:00.000000000', '2021-09-01T18:00:00.000000000', '2021-09-01T19:00:00.000000000', '2021-09-01T20:00:00.000000000', '2021-09-01T21:00:00.000000000', '2021-09-01T22:00:00.000000000', '2021-09-01T23:00:00.000000000'], dtype='datetime64[ns]')
- t2m(time, latitude, longitude)float32...
- units :
- K
- long_name :
- 2 metre temperature
array([[[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], [[299.03745, 299.0257 , 299.40656], [299.81476, 299.89877, 300.39294], [299.74252, 300.01593, 300.48468]], ..., [[297.7106 , 297.41568, 297.7399 ], [299.78677, 299.86887, 300.029 ], [299.7594 , 299.7887 , 300.0837 ]], [[297.81107, 297.36383, 297.7017 ], [299.93607, 299.76227, 299.9537 ], [299.7818 , 299.71927, 300.03568]]], dtype=float32)
- tp(time, latitude, longitude)float32...
- units :
- m
- long_name :
- Total precipitation
array([[[4.115138e-04, 1.570693e-03, 4.553772e-04], [2.718046e-04, 2.169586e-04, 1.864362e-04], [1.769021e-04, 1.373173e-04, 1.716466e-04]], [[8.273063e-04, 5.416890e-04, 2.589354e-04], [3.442701e-04, 7.409945e-04, 2.875375e-04], [1.201695e-04, 7.819268e-05, 5.770964e-05]], ..., [[3.547812e-04, 7.839146e-04, 8.001191e-04], [1.459080e-04, 2.803954e-04, 3.843267e-04], [6.960193e-05, 9.059033e-05, 1.954986e-04]], [[2.574867e-04, 4.444282e-04, 5.893593e-04], [6.677199e-05, 4.711101e-04, 3.547812e-04], [6.390840e-05, 2.546231e-04, 2.184073e-04]]], dtype=float32)
- Conventions :
- CF-1.6
- history :
- 2021-10-01 12:00:45 GMT by grib_to_netcdf-2.20.0: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data1/adaptor.mars.internal-1633089645.2267637-14809-16-62ab5eb4-4a04-4049-b27b-8078f590d31c.nc /cache/tmp/62ab5eb4-4a04-4049-b27b-8078f590d31c-adaptor.mars.internal-1633089641.2883248-14809-16-tmp.grib
5.1.7. Selecting the Internal Data#
You can directly access Coordinates
and Data variables
by their name.
ds.t2m
<xarray.DataArray 't2m' (time: 24, latitude: 3, longitude: 3)> array([[[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], [[299.03745, 299.0257 , 299.40656], [299.81476, 299.89877, 300.39294], [299.74252, 300.01593, 300.48468]], ..., [[297.7106 , 297.41568, 297.7399 ], [299.78677, 299.86887, 300.029 ], [299.7594 , 299.7887 , 300.0837 ]], [[297.81107, 297.36383, 297.7017 ], [299.93607, 299.76227, 299.9537 ], [299.7818 , 299.71927, 300.03568]]], dtype=float32) Coordinates: * longitude (longitude) float32 103.5 103.8 104.0 * latitude (latitude) float32 1.5 1.25 1.0 * time (time) datetime64[ns] 2021-09-01 ... 2021-09-01T23:00:00 Attributes: units: K long_name: 2 metre temperature
- time: 24
- latitude: 3
- longitude: 3
- 298.8 298.6 299.0 299.7 300.1 300.4 ... 299.8 300.0 299.8 299.7 300.0
array([[[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], [[299.03745, 299.0257 , 299.40656], [299.81476, 299.89877, 300.39294], [299.74252, 300.01593, 300.48468]], ..., [[297.7106 , 297.41568, 297.7399 ], [299.78677, 299.86887, 300.029 ], [299.7594 , 299.7887 , 300.0837 ]], [[297.81107, 297.36383, 297.7017 ], [299.93607, 299.76227, 299.9537 ], [299.7818 , 299.71927, 300.03568]]], dtype=float32)
- longitude(longitude)float32103.5 103.8 104.0
- units :
- degrees_east
- long_name :
- longitude
array([103.5 , 103.75, 104. ], dtype=float32)
- latitude(latitude)float321.5 1.25 1.0
- units :
- degrees_north
- long_name :
- latitude
array([1.5 , 1.25, 1. ], dtype=float32)
- time(time)datetime64[ns]2021-09-01 ... 2021-09-01T23:00:00
- long_name :
- time
array(['2021-09-01T00:00:00.000000000', '2021-09-01T01:00:00.000000000', '2021-09-01T02:00:00.000000000', '2021-09-01T03:00:00.000000000', '2021-09-01T04:00:00.000000000', '2021-09-01T05:00:00.000000000', '2021-09-01T06:00:00.000000000', '2021-09-01T07:00:00.000000000', '2021-09-01T08:00:00.000000000', '2021-09-01T09:00:00.000000000', '2021-09-01T10:00:00.000000000', '2021-09-01T11:00:00.000000000', '2021-09-01T12:00:00.000000000', '2021-09-01T13:00:00.000000000', '2021-09-01T14:00:00.000000000', '2021-09-01T15:00:00.000000000', '2021-09-01T16:00:00.000000000', '2021-09-01T17:00:00.000000000', '2021-09-01T18:00:00.000000000', '2021-09-01T19:00:00.000000000', '2021-09-01T20:00:00.000000000', '2021-09-01T21:00:00.000000000', '2021-09-01T22:00:00.000000000', '2021-09-01T23:00:00.000000000'], dtype='datetime64[ns]')
- units :
- K
- long_name :
- 2 metre temperature
ds.time
<xarray.DataArray 'time' (time: 24)> array(['2021-09-01T00:00:00.000000000', '2021-09-01T01:00:00.000000000', '2021-09-01T02:00:00.000000000', '2021-09-01T03:00:00.000000000', '2021-09-01T04:00:00.000000000', '2021-09-01T05:00:00.000000000', '2021-09-01T06:00:00.000000000', '2021-09-01T07:00:00.000000000', '2021-09-01T08:00:00.000000000', '2021-09-01T09:00:00.000000000', '2021-09-01T10:00:00.000000000', '2021-09-01T11:00:00.000000000', '2021-09-01T12:00:00.000000000', '2021-09-01T13:00:00.000000000', '2021-09-01T14:00:00.000000000', '2021-09-01T15:00:00.000000000', '2021-09-01T16:00:00.000000000', '2021-09-01T17:00:00.000000000', '2021-09-01T18:00:00.000000000', '2021-09-01T19:00:00.000000000', '2021-09-01T20:00:00.000000000', '2021-09-01T21:00:00.000000000', '2021-09-01T22:00:00.000000000', '2021-09-01T23:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2021-09-01 ... 2021-09-01T23:00:00 Attributes: long_name: time
- time: 24
- 2021-09-01 2021-09-01T01:00:00 ... 2021-09-01T23:00:00
array(['2021-09-01T00:00:00.000000000', '2021-09-01T01:00:00.000000000', '2021-09-01T02:00:00.000000000', '2021-09-01T03:00:00.000000000', '2021-09-01T04:00:00.000000000', '2021-09-01T05:00:00.000000000', '2021-09-01T06:00:00.000000000', '2021-09-01T07:00:00.000000000', '2021-09-01T08:00:00.000000000', '2021-09-01T09:00:00.000000000', '2021-09-01T10:00:00.000000000', '2021-09-01T11:00:00.000000000', '2021-09-01T12:00:00.000000000', '2021-09-01T13:00:00.000000000', '2021-09-01T14:00:00.000000000', '2021-09-01T15:00:00.000000000', '2021-09-01T16:00:00.000000000', '2021-09-01T17:00:00.000000000', '2021-09-01T18:00:00.000000000', '2021-09-01T19:00:00.000000000', '2021-09-01T20:00:00.000000000', '2021-09-01T21:00:00.000000000', '2021-09-01T22:00:00.000000000', '2021-09-01T23:00:00.000000000'], dtype='datetime64[ns]')
- time(time)datetime64[ns]2021-09-01 ... 2021-09-01T23:00:00
- long_name :
- time
array(['2021-09-01T00:00:00.000000000', '2021-09-01T01:00:00.000000000', '2021-09-01T02:00:00.000000000', '2021-09-01T03:00:00.000000000', '2021-09-01T04:00:00.000000000', '2021-09-01T05:00:00.000000000', '2021-09-01T06:00:00.000000000', '2021-09-01T07:00:00.000000000', '2021-09-01T08:00:00.000000000', '2021-09-01T09:00:00.000000000', '2021-09-01T10:00:00.000000000', '2021-09-01T11:00:00.000000000', '2021-09-01T12:00:00.000000000', '2021-09-01T13:00:00.000000000', '2021-09-01T14:00:00.000000000', '2021-09-01T15:00:00.000000000', '2021-09-01T16:00:00.000000000', '2021-09-01T17:00:00.000000000', '2021-09-01T18:00:00.000000000', '2021-09-01T19:00:00.000000000', '2021-09-01T20:00:00.000000000', '2021-09-01T21:00:00.000000000', '2021-09-01T22:00:00.000000000', '2021-09-01T23:00:00.000000000'], dtype='datetime64[ns]')
- long_name :
- time
ds.longitude
<xarray.DataArray 'longitude' (longitude: 3)> array([103.5 , 103.75, 104. ], dtype=float32) Coordinates: * longitude (longitude) float32 103.5 103.8 104.0 Attributes: units: degrees_east long_name: longitude
- longitude: 3
- 103.5 103.8 104.0
array([103.5 , 103.75, 104. ], dtype=float32)
- longitude(longitude)float32103.5 103.8 104.0
- units :
- degrees_east
- long_name :
- longitude
array([103.5 , 103.75, 104. ], dtype=float32)
- units :
- degrees_east
- long_name :
- longitude
Below you can access the specific value by the index or slices.
# first value of time dimension
ds.time[0]
<xarray.DataArray 'time' ()> array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]') Coordinates: time datetime64[ns] 2021-09-01 Attributes: long_name: time
- 2021-09-01
array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]')
- time()datetime64[ns]2021-09-01
- long_name :
- time
array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]')
- long_name :
- time
# get a 2-demension data with first time index
ds.t2m[0]
<xarray.DataArray 't2m' (latitude: 3, longitude: 3)> array([[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], dtype=float32) Coordinates: * longitude (longitude) float32 103.5 103.8 104.0 * latitude (latitude) float32 1.5 1.25 1.0 time datetime64[ns] 2021-09-01 Attributes: units: K long_name: 2 metre temperature
- latitude: 3
- longitude: 3
- 298.8 298.6 299.0 299.7 300.1 300.4 299.7 300.1 300.4
array([[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], dtype=float32)
- longitude(longitude)float32103.5 103.8 104.0
- units :
- degrees_east
- long_name :
- longitude
array([103.5 , 103.75, 104. ], dtype=float32)
- latitude(latitude)float321.5 1.25 1.0
- units :
- degrees_north
- long_name :
- latitude
array([1.5 , 1.25, 1. ], dtype=float32)
- time()datetime64[ns]2021-09-01
- long_name :
- time
array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]')
- units :
- K
- long_name :
- 2 metre temperature
You can also use sel()
function to conduct label-based indexing.
ds.t2m.sel(time='2021-09-01 00:00:00') # select all data of 2021-09-01 00:00:00
<xarray.DataArray 't2m' (latitude: 3, longitude: 3)> array([[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], dtype=float32) Coordinates: * longitude (longitude) float32 103.5 103.8 104.0 * latitude (latitude) float32 1.5 1.25 1.0 time datetime64[ns] 2021-09-01 Attributes: units: K long_name: 2 metre temperature
- latitude: 3
- longitude: 3
- 298.8 298.6 299.0 299.7 300.1 300.4 299.7 300.1 300.4
array([[298.80975, 298.64566, 298.9777 ], [299.67108, 300.09094, 300.39368], [299.69064, 300.13788, 300.4328 ]], dtype=float32)
- longitude(longitude)float32103.5 103.8 104.0
- units :
- degrees_east
- long_name :
- longitude
array([103.5 , 103.75, 104. ], dtype=float32)
- latitude(latitude)float321.5 1.25 1.0
- units :
- degrees_north
- long_name :
- latitude
array([1.5 , 1.25, 1. ], dtype=float32)
- time()datetime64[ns]2021-09-01
- long_name :
- time
array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]')
- units :
- K
- long_name :
- 2 metre temperature
# select data locate in (103.5, 1.5) of 2021-09-01 00:00:00
ds.t2m.sel(longitude=103.5, latitude=1.5, time='2021-09-01 00:00:00')
<xarray.DataArray 't2m' ()> array(298.80975, dtype=float32) Coordinates: longitude float32 103.5 latitude float32 1.5 time datetime64[ns] 2021-09-01 Attributes: units: K long_name: 2 metre temperature
- 298.8
array(298.80975, dtype=float32)
- longitude()float32103.5
- units :
- degrees_east
- long_name :
- longitude
array(103.5, dtype=float32)
- latitude()float321.5
- units :
- degrees_north
- long_name :
- latitude
array(1.5, dtype=float32)
- time()datetime64[ns]2021-09-01
- long_name :
- time
array('2021-09-01T00:00:00.000000000', dtype='datetime64[ns]')
- units :
- K
- long_name :
- 2 metre temperature
# compute the average value in 2021-09-01 00:00:00
ds.t2m[0].values.mean()
299.65002
5.1.8. Read From URL#
Data used for the CE3201 final project are simulated temperature (unit: K) from the latest climate model outputs, which have been divided into two parts: historical (1850.01.01-2014.12.31) simulations and future (2015.01.01-2100.12.31) projections. You can find these data here.
from urllib.request import urlretrieve
import ssl
# Ignore certificate validation
ssl._create_default_https_context = ssl._create_unverified_context
# Everyone can access unique data by specifying your student ID and label (future or historical).
# For example, if your student ID is A0188677A and you will get future data
# you can using following codes to access your data.
student_ID = 'A0188677A'
label = 'future'
filename = '%s_%s.nc'%(student_ID, label)
url = 'https://hydrology.princeton.edu/data/hexg/CE3201_final_project_data/%s'%(filename)
# download file from remote server
urlretrieve(url, filename)
ds = xr.open_dataset(filename)
ds
<xarray.Dataset> Dimensions: (model: 10, ssp: 2, time: 31411) Coordinates: * model (model) object 'MIROC6' 'MPI-ESM1-2-LR' ... 'AWI-CM-1-1-MR' * ssp (ssp) object 'ssp126' 'ssp585' * time (time) datetime64[ns] 2015-01-01T12:00:00 ... 2100-12-31T12:00:00 Data variables: tas (model, ssp, time) float64 ... Attributes: latitude: 3.5896654 longitude: 98.6738261 city: Medan StudentID: A0188677A
- model: 10
- ssp: 2
- time: 31411
- model(model)object'MIROC6' ... 'AWI-CM-1-1-MR'
array(['MIROC6', 'MPI-ESM1-2-LR', 'ACCESS-ESM1-5', 'IPSL-CM6A-LR', 'ACCESS-CM2', 'MPI-ESM1-2-HR', 'EC-Earth3-Veg', 'MRI-ESM2-0', 'EC-Earth3-Veg-LR', 'AWI-CM-1-1-MR'], dtype=object)
- ssp(ssp)object'ssp126' 'ssp585'
array(['ssp126', 'ssp585'], dtype=object)
- time(time)datetime64[ns]2015-01-01T12:00:00 ... 2100-12-...
array(['2015-01-01T12:00:00.000000000', '2015-01-02T12:00:00.000000000', '2015-01-03T12:00:00.000000000', ..., '2100-12-29T12:00:00.000000000', '2100-12-30T12:00:00.000000000', '2100-12-31T12:00:00.000000000'], dtype='datetime64[ns]')
- tas(model, ssp, time)float64...
- units :
- K
- long_name :
- Near-Surface Air Temperature
- standard_name :
- air_temperature
- comment :
- near-surface (usually, 2 meter) air temperature
[628220 values with dtype=float64]
- latitude :
- 3.5896654
- longitude :
- 98.6738261
- city :
- Medan
- StudentID :
- A0188677A
# For convenience, the process above is rewritten using function
def read_by_student_ID(student_ID, label):
from urllib.request import urlretrieve
import xarray as xr
import os
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
filename = '%s_%s.nc'%(student_ID, label)
if os.path.exists(filename):
return xr.open_dataset(filename)
url = 'https://hydrology.princeton.edu/data/hexg/CE3201_final_project_data/%s'%(filename)
urlretrieve(url, filename)
return xr.open_dataset(filename)
student_ID = 'A0188677A'
# label = 'future'
label = 'historical'
ds = read_by_student_ID(student_ID, label)
ds
<xarray.Dataset> Dimensions: (model: 10, time: 60265) Coordinates: * model (model) object 'MIROC6' 'MPI-ESM1-2-LR' ... 'AWI-CM-1-1-MR' * time (time) datetime64[ns] 1850-01-01T12:00:00 ... 2014-12-31T12:00:00 Data variables: tas (model, time) float64 ... Attributes: latitude: 3.5896654 longitude: 98.6738261 city: Medan StudentID: A0188677A
- model: 10
- time: 60265
- model(model)object'MIROC6' ... 'AWI-CM-1-1-MR'
array(['MIROC6', 'MPI-ESM1-2-LR', 'ACCESS-ESM1-5', 'IPSL-CM6A-LR', 'ACCESS-CM2', 'MPI-ESM1-2-HR', 'EC-Earth3-Veg', 'MRI-ESM2-0', 'EC-Earth3-Veg-LR', 'AWI-CM-1-1-MR'], dtype=object)
- time(time)datetime64[ns]1850-01-01T12:00:00 ... 2014-12-...
array(['1850-01-01T12:00:00.000000000', '1850-01-02T12:00:00.000000000', '1850-01-03T12:00:00.000000000', ..., '2014-12-29T12:00:00.000000000', '2014-12-30T12:00:00.000000000', '2014-12-31T12:00:00.000000000'], dtype='datetime64[ns]')
- tas(model, time)float64...
- units :
- K
- long_name :
- Near-Surface Air Temperature
- standard_name :
- air_temperature
- comment :
- near-surface (usually, 2 meter) air temperature
[602650 values with dtype=float64]
- latitude :
- 3.5896654
- longitude :
- 98.6738261
- city :
- Medan
- StudentID :
- A0188677A