{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"# Xarray in 45 minutes\n",
"\n",
"In this lesson, we discuss cover the basics of Xarray data structures. By the\n",
"end of the lesson, we will be able to:\n",
"\n",
"- Understand the basic data structures in Xarray\n",
"- Inspect `DataArray` and `Dataset` objects.\n",
"- Read and write netCDF files using Xarray.\n",
"- Understand that there are many packages that build on top of xarray\n",
"\n",
"\n",
"We'll start by reviewing the various components of the Xarray data model, represented here visually:\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import xarray as xr\n",
"\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format='retina'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Xarray has a few small real-world tutorial datasets hosted in the [xarray-data](https://github.com/pydata/xarray-data) GitHub repository.\n",
"\n",
"[xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) is a convenience function to download and open DataSets by name (listed at that link).\n",
"\n",
"Here we'll use `air temperature` from the [National Center for Environmental Prediction](https://www.weather.gov/ncep/). Xarray objects have convenient HTML representations to give an overview of what we're working with:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that behind the scenes the `tutorial.open_dataset` downloads a file. It then uses [`xarray.open_dataset`](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray-open-dataset) function to open that file (which for this datasets is a [netCDF](https://www.unidata.ucar.edu/software/netcdf/) file). \n",
"\n",
"A few things are done automatically upon opening, but controlled by keyword arguments. For example, try passing the keyword argument `mask_and_scale=False`... what happens?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What's in a Dataset? \n",
"\n",
"*Many DataArrays!* \n",
"\n",
"What's a DataArray?\n",
"\n",
"Datasets are dictionary-like containers of DataArrays. They are a mapping of\n",
"variable name to DataArray:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# pull out \"air\" dataarray with dictionary syntax\n",
"ds[\"air\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can save some typing by using the \"attribute\" or \"dot\" notation. This won't\n",
"work for variable names that clash with a built-in method name (like `mean` for\n",
"example).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"# pull out dataarray using dot notation\n",
"ds.air"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What's in a DataArray? \n",
"\n",
"*data + (a lot of) metadata*\n",
"\n",
"\n",
"### Named dimensions \n",
"\n",
"`.dims` correspond to the axes of your data. \n",
"\n",
"In this case we have 2 spatial dimensions (`latitude` and `longitude` are store with shorthand names `lat` and `lon`) and one temporal dimension (`time`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.dims"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Coordinate variables \n",
"\n",
"`.coords` is a simple [data container](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#coordinates)\n",
"for coordinate variables.\n",
"\n",
"Here we see the actual timestamps and spatial positions of our air temperature data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.coords"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Coordinates objects support similar indexing notation\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# extracting coordinate variables\n",
"ds.air.lon"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# extracting coordinate variables from .coords\n",
"ds.coords[\"lon\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is useful to think of the values in these coordinate variables as axis\n",
"\"labels\" such as \"tick labels\" in a figure. These are coordinate locations on a\n",
"grid at which you have data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Arbitrary attributes \n",
"\n",
"`.attrs` is a dictionary that can contain arbitrary Python objects (strings, lists, integers, dictionaries, etc.) Your only\n",
"limitation is that some attributes may not be writeable to certain file formats."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.attrs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# assign your own attributes!\n",
"ds.air.attrs[\"who_is_awesome\"] = \"xarray\"\n",
"ds.air.attrs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Underlying data \n",
"\n",
"`.data` contains the [numpy array](https://numpy.org) storing air temperature values.\n",
"\n",
"\n",
"\n",
"Xarray structures wrapunderlying simpler array-like data structures. This part of Xarray is quite extensible allowing for distributed array, GPU arrays, sparse arrays, arrays with units etc. We'll briefly look at this later in this tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"output_scroll"
]
},
"outputs": [],
"source": [
"ds.air.data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# what is the type of the underlying data\n",
"type(ds.air.data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review\n",
"\n",
"Xarray provides two main data structures:\n",
"\n",
"1. [`DataArrays`](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray) that wrap underlying data containers (e.g. numpy arrays) and contain associated metadata\n",
"\n",
"1. [`DataSets`](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset) that are dictionary-like containers of DataArrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Why Xarray? \n",
"\n",
"Metadata provides context and provides code that is more legible. This reduces the likelihood of errors from typos and makes analysis more intuitive and fun!\n",
"\n",
"### Analysis without xarray: `X(`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the first timestep\n",
"lat = ds.air.lat.data # numpy array\n",
"lon = ds.air.lon.data # numpy array\n",
"temp = ds.air.data # numpy array\n",
"plt.figure()\n",
"plt.pcolormesh(lon, lat, temp[0, :, :]);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"temp.mean(axis=1) ## what did I just do? I can't tell by looking at this line."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analysis with xarray `=)`\n",
"\n",
"How readable is this code?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.isel(time=1).plot(x=\"lon\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use dimension names instead of axis numbers\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.mean(\"time\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Extracting data or \"indexing\" \n",
"\n",
"Xarray supports\n",
"\n",
"- label-based indexing using `.sel`\n",
"- position-based indexing using `.isel`\n",
"\n",
"See the [user guide](https://docs.xarray.dev/en/stable/indexing.html) for more."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Label-based indexing\n",
"\n",
"Xarray inherits its label-based indexing rules from pandas; this means great\n",
"support for dates and times!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# pull out data for all of 2013-May\n",
"ds.sel(time=\"2013-05\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# demonstrate slicing\n",
"ds.sel(time=slice(\"2013-05\", \"2013-07\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# demonstrate \"nearest\" indexing\n",
"ds.sel(lon=240.2, method=\"nearest\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# \"nearest indexing at multiple points\"\n",
"ds.sel(lon=[240.125, 234], lat=[40.3, 50.3], method=\"nearest\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Position-based indexing\n",
"\n",
"This is similar to your usual numpy `array[0, 2, 3]` but with the power of named\n",
"dimensions!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# pull out time index 0 and lat index 0\n",
"ds.air.isel(time=0, lat=0) # much better than ds.air[0, 0, :]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"output_scroll",
"hide-output"
]
},
"outputs": [],
"source": [
"# demonstrate slicing\n",
"ds.air.isel(lat=slice(10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Concepts for computation\n",
"\n",
"Consider calculating the *mean air temperature per unit surface area* for this dataset. Because latitude and longitude correspond to spherical coordinates for Earth's surface, each 2.5x2.5 degree grid cell actually has a different surface area as you move away from the equator! This is because *latitudinal length* is fixed ($ \\delta Lat = R \\delta \\phi $), but *longitudinal length varies with latitude* ($ \\delta Lon = R \\delta \\lambda \\cos(\\phi) $)\n",
"\n",
"So the [area element for lat-lon coordinates](https://en.wikipedia.org/wiki/Spherical_coordinate_system#Integration_and_differentiation_in_spherical_coordinates) is\n",
"\n",
"\n",
"$$ \\delta A = R^2 \\delta\\phi \\, \\delta\\lambda \\cos(\\phi) $$\n",
"\n",
"where $\\phi$ is latitude, $\\delta \\phi$ is the spacing of the points in latitude, $\\delta \\lambda$ is the spacing of the points in longitude, and $R$ is Earth's radius. (In this formula, $\\phi$ and $\\lambda$ are measured in radians)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Earth's average radius in meters\n",
"R = 6.371e6\n",
"\n",
"# Coordinate spacing for this dataset is 2.5 x 2.5 degrees\n",
"dϕ = np.deg2rad(2.5)\n",
"dλ = np.deg2rad(2.5)\n",
"\n",
"dlat = R * dϕ * xr.ones_like(ds.air.lon)\n",
"dlon = R * dλ * np.cos(np.deg2rad(ds.air.lat))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two concepts here:\n",
"1. you can call functions like `np.cos` and `np.deg2rad` ([\"numpy ufuncs\"](https://numpy.org/doc/stable/reference/ufuncs.html)) on Xarray objects and receive an Xarray object back.\n",
"2. We used [ones_like](https://docs.xarray.dev/en/stable/generated/xarray.ones_like.html) to create a DataArray that looks like `ds.air.lon` in all respects, except that the data are all ones"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# cell latitude length is constant with longitude\n",
"dlat"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# cell longitude length changes with latitude\n",
"dlon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Broadcasting: expanding data\n",
"\n",
"Our longitude and latitude length DataArrays are both 1D with different dimension names. If we multiple these DataArrays together the dimensionality is expanded to 2D by _broadcasting_:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cell_area = dlon * dlat\n",
"cell_area"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result has two dimensions because xarray realizes that dimensions `lon` and\n",
"`lat` are different so it automatically \"broadcasts\" to get a 2D result. See the\n",
"last row in this image from _Jake VanderPlas Python Data Science Handbook_\n",
"\n",
"\n",
"\n",
"Because xarray knows about dimension names we avoid having to create unnecessary\n",
"size-1 dimensions using `np.newaxis` or `.reshape`. For more, see the [user guide](https://docs.xarray.dev/en/stable/user-guide/computation.html#broadcasting-by-dimension-name)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### Alignment: putting data on the same grid\n",
"\n",
"When doing arithmetic operations xarray automatically \"aligns\" i.e. puts the\n",
"data on the same grid. In this case `cell_area` and `ds.air` are at the same\n",
"lat, lon points we end up with a result with the same shape (25x53):\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.air.isel(time=1) / cell_area"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets make `cell_area` unaligned i.e. change the coordinate labels\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make a copy of cell_area\n",
"# then add 1e-5 degrees to latitude\n",
"cell_area_bad = cell_area.copy(deep=True)\n",
"cell_area_bad[\"lat\"] = cell_area.lat + 1e-5 # latitudes are off by 1e-5 degrees!\n",
"cell_area_bad"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cell_area_bad * ds.air.isel(time=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is an empty array with no latitude coordinates because none of them were aligned!\n",
"\n",
"**Tip:** If you notice extra NaNs or missing points after xarray computation, it\n",
"means that your xarray coordinates were not aligned _exactly_.\n",
"\n",
"For more, see\n",
"[the Xarray documentation](https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment). [This tutorial notebook](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html) also covers alignment and broadcasting.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## High level computation \n",
"\n",
"(`groupby`, `resample`, `rolling`, `coarsen`, `weighted`)\n",
"\n",
"Xarray has some very useful high level objects that let you do common\n",
"computations:\n",
"\n",
"1. `groupby` :\n",
" [Bin data in to groups and reduce](https://docs.xarray.dev/en/stable/groupby.html)\n",
"1. `resample` :\n",
" [Groupby specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n",
"1. `rolling` :\n",
" [Operate on rolling windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n",
"1. `coarsen` :\n",
" [Downsample your data](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n",
"1. `weighted` :\n",
" [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n",
"\n",
"\n",
"Below we quickly demonstrate these patterns. See the user guide links above and [the tutorial](https://tutorial.xarray.dev/intermediate/01-high-level-computation-patterns.html) for more."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### groupby\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# seasonal groups\n",
"ds.groupby(\"time.season\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make a seasonal mean\n",
"seasonal_mean = ds.groupby(\"time.season\").mean()\n",
"seasonal_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The seasons are out of order (they are alphabetically sorted). This is a common\n",
"annoyance. The solution is to use `.sel` to change the order of labels\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"seasonal_mean = seasonal_mean.sel(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n",
"seasonal_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### resample\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# resample to monthly frequency\n",
"ds.resample(time=\"M\").mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### weighted\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# weight by cell_area and take mean over (time, lon)\n",
"ds.weighted(cell_area).mean([\"lon\", \"time\"]).air.plot();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Visualization\n",
"\n",
"(`.plot`)\n",
"\n",
"\n",
"We have seen very simple plots earlier. Xarray also lets you easily visualize\n",
"3D and 4D datasets by presenting multiple facets (or panels or subplots) showing\n",
"variations across rows and/or columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# facet the seasonal_mean\n",
"seasonal_mean.air.plot(col=\"season\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# contours\n",
"seasonal_mean.air.plot.contour(col=\"season\", levels=20, add_colorbar=True);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# line plots too? wut\n",
"seasonal_mean.air.mean(\"lon\").plot.line(hue=\"season\", y=\"lat\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more see the [user guide](https://docs.xarray.dev/en/stable/plotting.html), the [gallery](https://docs.xarray.dev/en/stable/examples/visualization_gallery.html), and [the tutorial material](https://tutorial.xarray.dev/fundamentals/04.0_plotting.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Reading and writing files\n",
"\n",
"Xarray supports many disk formats. Below is a small example using netCDF. For\n",
"more see the [documentation](https://docs.xarray.dev/en/stable/user-guide/io.html)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# write to netCDF\n",
"ds.to_netcdf(\"my-example-dataset.nc\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"To avoid the `SerializationWarning` you can assign a _FillValue for any NaNs in 'air' array by adding the keyword argument encoding=dict(air={_FillValue=-9999})\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# read from disk\n",
"fromdisk = xr.open_dataset(\"my-example-dataset.nc\")\n",
"fromdisk"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# check that the two are identical\n",
"ds.identical(fromdisk)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tip:** A common use case to read datasets that are a collection of many netCDF\n",
"files. See the [documentation](https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets) for how\n",
"to handle that.\n",
"\n",
"Finally to read other file formats, you might find yourself reading in the data using a different library and then creating a DataArray([docs](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#creating-a-dataarray), [tutorial](https://tutorial.xarray.dev/fundamentals/01.1_creating_data_structures.html)) from scratch. For example, you might use `h5py` to open an HDF5 file and then create a Dataset from that.\n",
"For MATLAB files you might use `scipy.io.loadmat` or `h5py` depending on the version of MATLAB file you're opening and then construct a Dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## The scientific python ecosystem\n",
"\n",
"Xarray ties in to the larger scientific python ecosystem and in turn many\n",
"packages build on top of xarray. A long list of such packages is here:\n",
".\n",
"\n",
"Now we will demonstrate some cool features.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pandas: tabular data structures\n",
"\n",
"You can easily [convert](https://docs.xarray.dev/en/stable/pandas.html) between xarray and [pandas](https://pandas.pydata.org/) structures. This allows you to conveniently use the extensive pandas \n",
"ecosystem of packages (like [seaborn](https://seaborn.pydata.org/)) for your work.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert to pandas dataframe\n",
"df = ds.isel(time=slice(10)).to_dataframe()\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert dataframe to xarray\n",
"df.to_xarray()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alternative array types\n",
"\n",
"This notebook has focused on Numpy arrays. Xarray can wrap [other array](https://docs.xarray.dev/en/stable/user-guide/duckarrays.html) types! For example:\n",
"\n",
" [distributed parallel arrays](https://docs.dask.org/en/latest/array.html) & [Xarray user guide on Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html)\n",
"\n",
"\n",
" **pydata/sparse** : [sparse arrays](https://sparse.pydata.org)\n",
"\n",
" [GPU arrays](https://cupy.dev) & [cupy-xarray](https://cupy-xarray.readthedocs.io/)\n",
"\n",
" **pint** : [unit-aware arrays](https://pint.readthedocs.io) & [pint-xarray](https://github.com/xarray-contrib/pint-xarray)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dask\n",
"\n",
"Dask cuts up NumPy arrays into blocks and parallelizes your analysis code across\n",
"these blocks\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate dask dataset\n",
"dasky = xr.tutorial.open_dataset(\n",
" \"air_temperature\",\n",
" chunks={\"time\": 10}, # 10 time steps in each block\n",
")\n",
"\n",
"dasky.air"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All computations with dask-backed xarray objects are lazy, allowing you to build\n",
"up a complicated chain of analysis steps quickly\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate lazy mean\n",
"dasky.air.mean(\"lat\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get concrete values, call `.compute` or `.load`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# \"compute\" the mean\n",
"dasky.air.mean(\"lat\").compute()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HoloViz\n",
"\n",
"Quickly generate interactive plots from your data!\n",
"\n",
"The [`hvplot` package](https://hvplot.holoviz.org/user_guide/Gridded_Data.html) attaches itself to all\n",
"xarray objects under the `.hvplot` namespace. So instead of using `.plot` use `.hvplot`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import hvplot.xarray\n",
"\n",
"ds.air.hvplot(groupby=\"time\", clim=(270, 300), widget_location='bottom')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"The time slider will only work if you're executing the notebook, rather than viewing the website\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### cf_xarray \n",
"\n",
"[cf_xarray](https://cf-xarray.readthedocs.io/) is a project that tries to\n",
"let you make use of other CF attributes that xarray ignores. It attaches itself\n",
"to all xarray objects under the `.cf` namespace.\n",
"\n",
"Where xarray allows you to specify dimension names for analysis, `cf_xarray`\n",
"lets you specify logical names like `\"latitude\"` or `\"longitude\"` instead as\n",
"long as the appropriate CF attributes are set.\n",
"\n",
"For example, the `\"longitude\"` dimension in different files might be labelled as: (lon, LON, long, x…), but cf_xarray let's you always refer to the logical name `\"longitude\"` in your code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cf_xarray"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# describe cf attributes in dataset\n",
"ds.air.cf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following `mean` operation will work with any dataset that has appropriate\n",
"attributes set that allow detection of the \"latitude\" variable (e.g.\n",
"`units: \"degress_north\"` or `standard_name: \"latitude\"`)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate equivalent of .mean(\"lat\")\n",
"ds.air.cf.mean(\"latitude\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate indexing\n",
"ds.air.cf.sel(longitude=242.5, method=\"nearest\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Other cool packages\n",
"\n",
"- [xgcm](https://xgcm.readthedocs.io/) : grid-aware operations with xarray\n",
" objects\n",
"- [xrft](https://xrft.readthedocs.io/) : fourier transforms with xarray\n",
"- [xclim](https://xclim.readthedocs.io/) : calculating climate indices with\n",
" xarray objects\n",
"- [intake-xarray](https://intake-xarray.readthedocs.io/) : forget about file\n",
" paths\n",
"- [rioxarray](https://corteva.github.io/rioxarray/stable/index.html) : raster\n",
" files and xarray\n",
"- [xesmf](https://xesmf.readthedocs.io/) : regrid using ESMF\n",
"- [MetPy](https://unidata.github.io/MetPy/latest/index.html) : tools for working\n",
" with weather data\n",
"\n",
"Check the Xarray [Ecosystem](https://docs.xarray.dev/en/stable/ecosystem.html) page and [this tutorial](https://tutorial.xarray.dev/intermediate/xarray_ecosystem.html) for even more packages and demonstrations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"\n",
"1. Read the [tutorial](https://tutorial.xarray.dev) material and [user guide](https://docs.xarray.dev/en/stable/user-guide/index.html)\n",
"1. See the description of [common terms](https://docs.xarray.dev/en/stable/terminology.html) used in the xarray documentation: \n",
"1. Answers to common questions on \"how to do X\" with Xarray are [here](https://docs.xarray.dev/en/stable/howdoi.html)\n",
"1. Ryan Abernathey has a book on data analysis with a [chapter on Xarray](https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html)\n",
"1. [Project Pythia](https://projectpythia.org/) has [foundational](https://foundations.projectpythia.org/landing-page.html) and more [advanced](https://cookbooks.projectpythia.org/) material on Xarray. Pythia also aggregates other [Python learning resources](https://projectpythia.org/resource-gallery.html).\n",
"1. The [Xarray Github Discussions](https://github.com/pydata/xarray/discussions) and [Pangeo Discourse](https://discourse.pangeo.io/) are good places to ask questions.\n",
"1. Tell your friends! Tweet!\n",
"\n",
"\n",
"## Welcome!\n",
"\n",
"Xarray is an open-source project and gladly welcomes all kinds of contributions. This could include reporting bugs, discussing new enhancements, contributing code, helping answer user questions, contributing documentation (even small edits like fixing spelling mistakes or rewording to make the text clearer). Welcome!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:xarray-tutorial]",
"language": "python",
"name": "conda-env-xarray-tutorial-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}