Local example: download Livneh climate observations and perform a simple analysis with xarray and dask.
Keeling (UIUC ATMS cluster) examples: calculate Fire Weather Index using xclim, and growing degree days.
ROAR (PSU cluster) examples: set up a dask distributed cluster.
For getting started with python, jupyter notebooks, and dask on Keeling, see these walkthroughs:
- Keeling Crash Course courtesy of Max Grover
- Using dask-distributed on keeling courtesy of Steve Nesbitt
To create and activate a new conda environment, you can use conda:
conda create --name climate_stackconda activate climate_stackconda install -c conda-forge xarray bottleneck cartopy dask distributed netCDF4 rioxarray nodejs jupyterlab cftime nc-time-axis dask-jobqueue xclim dask-labextension scipy zarr rasterio matplotlib pint
but mamba will be much faster:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"bash Mambaforge-$(uname)-$(uname -m).shmamba create -n climate-stack-mamba xarray bottleneck cartopy dask distributed netCDF4 rioxarray nodejs jupyterlab cftime nc-time-axis dask-jobqueue xclim dask-labextension scipy zarr rasterio matplotlib pint -c conda-forgemamba activate climate-stack-mamba
- Copy the
jobqueue_KEELING.yamlfile in this repository into$HOME/.config/dask/on keeling and changeYOUR-USER-IDwhere appropriate. Rename the file tojobqueue.yaml. - Follow the steps in either the FWI or DegreeDays notebook to create the cluster. Make sure to specify the scheduler options as follows:
cluster = SLURMCluster(scheduler_options={'host': '172.22.179.3:XXXX'})where XXXX is between 7000-8000. Also, the notebook from which you initialize the cluster must be run from a head node! A compute node will not work.
To create and activate the climate-stack conda environment from above:
module load anaconda3/2021.05cd /storage/home/YOUR-PSU-ID/workmkdir ENVScd ENVSmkdir climate-stackcd climate-stackconda create -p $PWDsource activate /storage/home/YOUR-PSU-ID/work/ENVS/climate-stackconda install -c conda-forge xarray bottleneck cartopy dask distributed geopandas xagg netCDF4 seaborn nodejs jupyterlab cartopy cftime nc-time-axis dask-jobqueue xclim dask-labextension
Remember to edit YOUR-PSU-ID in steps 2 and 8. Each time you need to activate the environment, repeat steps 1 and 8.
Note: ROAR will eventually update the anaconda module to a later version (step 1). If module avail shows that a newer anaconda distribution is available to you, use that one instead.
- Similar to keeling, copy the
jobqueue_ROAR.yamlfile in this repository into$HOME/.config/dask/on keeling and changeYOUR-PSU-IDwhere appropriate. Rename the file tojobqueue.yaml. - Follow the steps in the
ROAR_examplenotebook to create the cluster. A cluster can be created from head nodes or (recommended) compute nodes.
(Most of this material comes from a blog post by Ben Lindsay.)
Accessing Jupyter Lab remotely can be made less of a hassle by defining some shortcut functions in your remote and local .bashrc files:
- In the remote machine (ROAR), add the following lines to your
$HOME/.bashrcfile:
ROAR:
function jlremote {
echo $(hostname) > ~/.jupyternode.txt
module load anaconda3/2021.05
source activate /storage/home/YOUR-PSU-ID/work/ENVS/climate-stack
cd work
jupyter lab --ip=$(hostname) --port=XXXX --no-browser
}
- In your local machine, add the following lines to your
$HOME/.bashrcfile:
function jllocal {
port=XXXX
remote_username=YOUR-PSU-ID
remote_hostname=submit.aci.ics.psu.edu
node=$(ssh aci 'tail -1 ~/.jupyternode.txt')
url="http://localhost:$port"
echo "Opening $url"
open "$url"
cmd="ssh -CNL "$port":"$node":"$port" $remote_username@$remote_hostname"
echo "Running '$cmd'"
eval "$cmd"
}
In both steps above, replace the port number XXXX with an arbitrary 4 digit number. Remember to also update YOUR-PSU-ID where appropriate.
The next time you log in, you can now start a Jupyter Lab by executing jlremote on the remote machine and then jllocal on your local machine. A browser window should appear asking you to log in to Jupyter Lab.