PyData NYC 2024

Using NASA EarthData Cloud & Python to Model Climate Risks
11-06, 09:00–10:30 (US/Eastern), Music Box

The goal of this tutorial is to give you hands-on experience accessing & using NASA Earthdata Cloud — i.e., freely available satellite data — through Pythonic APIs. Ideally, you are a data-curious Pythonista who wants to use NASA data products for geospatial analysis. Modest experience with the PyData stack is expected, but you'll be walked through particular corners of the relevant libraries (e.g., Xarray, Rasterio, Hvplot, etc.) as required. You'll need only a web browser & a network connection to connect to a pre-configured cloud computing environment. The case studies you'll explore — floods & wildfires — highlight strategies for "data-proximate computing," i.e., using cloud-compute resources with distributed data. At the end, you'll be set up to carry out your own explorations of NASA's publicly available earth data in Python.


This tutorial walks you through using data products from NASA Earthdata Cloud for analysis of environmental risk scenarios (e.g., floods, wildfires). To do so, you'll construct quantitative estimates of changes in hydrological water mass balance over various defined geographical regions of interest using cloud-based infrastructure and data. The goal is to build enough familiarity with generic cloud-based Jupyer/Python workflows and with remote-sensing data that you can adapt and remix the examples for other region-specific contexts. Throughout, you'll reinforce best practices of data-proximate computing and of reproducibility (as supported by NASA's Open Science and Transform to Open Science (TOPS) initiatives).

Approximate schedule:

  • minute 0-14: Introduction & Setup (logging in, configuring NASA EarthData credentials)
  • minute 15-19: Reminders about geographic data formats
  • minute 20-39: Overview of PyData tools for geographic data
  • minute 40-59: Using NASA EarthData Products (DIST, DWSx)
  • minute 60-74: Case study: wildfires
  • minute 75-84: Case study: flooding
  • minute 85-90: Wrap-up

Once you have verified their NASA EarthData Cloud credentials in Jupyter, you'll get a quick, non-comprehensive overview of PyData approaches — e.g., using Xarray, Rasterio, Hvplot, Geoviews, etc. — for manipulating and visualizing geospatial data. You don't need to have used these tools before, but reasonable familiarity with Python, NumPy, & Pandas will be useful. Finally, we launch into case studies that exemplify typical cloud-based workflows with EarthCloud data products.

The hands-on case studies make extensive use of OPERA (Observational Products for End-Users from Remote Sensing Analysis) data products; in particular, they rely on two particular categories of data products: DSWx (Dynamic Surface Water Extent) and DIST (Land Surface Disturbance). The workflows presented are based on notebooks drawn from the extensive OPERA Applications repository so that you can adapt the processes outlined to your own geospatial contexts.


Prior Knowledge Expected

Previous knowledge expected

See also:

Dhavide Aruliah has been teaching & mentoring both in academia and in industry for three decades. His career has grown around bringing learners from where they are to where they need to be mathematically & computationally. He was a university professor (Applied Mathematics & Computer Science) at Ontario Tech University before moving to industry where he oversaw training programs supporting the PyData stack at Anaconda Inc. and later at Quansight LLC. He has taught over 40 undergraduate- & graduate-level courses at five Canadian universities as well as numerous Software Carpentry, SciPy, & PyData tutorial workshops. Here are some examples of his past tutorials & talks: