PyData NYC 2024

Turning DataFrames into Pretty Pictures with Plotnine
11-06, 15:10–16:40 (US/Eastern), Music Box

Learn how Plotnine, a Python package inspired by R's ggplot2, enables the creation of sophisticated and effective data visualizations with minimal effort. This tutorial will explain how Plotnine's grammar of graphics approach provides a flexible, intuitive way to visualize data, either as ad-hoc plots or fine-tuned graphs suited for communication.


Quick links

Slideshow about what to expect

We've prepared a brief slideshow that showcases six images and one animation created with Plotnine, along with a bonus table designed using the Great Tables package. If you’re attending PyData NYC 2024 and want to develop the skills to create visualizations like these, consider joining us!

Instructions

In order to follow along, it's important to get everything set up, preferably before the workshop starts. You have two options: (a) use Posit Cloud (where everything is already installed) or (b) use your own laptop (and install everything locally). You can find instructions and the materials on the GitHub repository.

Outline

This tutorial will provide a deep dive into Plotnine, a Python data visualization package inspired by R's ggplot2, covering both its core principles and practical applications. We'll begin by explaining the grammar of graphics and how Plotnine leverages this concept. You’ll see how to construct plots step by step, adding layers like points, lines, and text to build rich and informative visualizations. We will also cover how to extensively customize these plots, including themes, color schemes, and faceting, to produce publication-quality visuals.

Outline:

  1. Introduction to Plotnine and the Grammar of Graphics:
    • Overview of the grammar of graphics and its benefits.
    • Why Plotnine? The advantages of using it in Python over other visualization tools.
  2. Building Your First Plot:
    • Creating a basic scatter plot.
    • Introduction to layering: adding points, lines, and customizing aesthetics.
  3. Advanced Visualization Techniques:
    • Faceting: splitting data into subsets for comparative visualization.
    • Statistical transformations: applying smoothing, binning, and more.
    • Customization: adjusting themes, labels, and legends.
  4. Q&A Session

Prior Knowledge Expected

No previous knowledge expected

I’m a data science tool builder at Posit, where I work on open source tools for data analysis. Previously, I worked as a consultant building out a data team for Caltrans (and love all things GTFS).

I received a Ph.D. in Cognitive Psychology from Princeton University, and am interested in what drives expert data science performance.

Thijs Nieuwdorp is a data scientist at Xomnia and co-author of Python Polars: The Definitive Guide. With a background in Artificial Intelligence from Radboud University, he specializes in innovation, Responsible AI, MLOps, and clean code. At Alliander, Thijs leveraged Polars to optimize simulations of the Dutch power grid, reducing execution time and memory usage by a factor of four and saving massively in costs—contributing to a more reliable power supply.

This speaker also appears in:

Jeroen Janssens, PhD, is a polyglot data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. Jeroen is passionate about open source and sharing knowledge. He is the author of Data Science at the Command Line (O’Reilly, 2021) and is currently writing Python Polars: The Definitive Guide (O’Reilly, 2025). Every now and then he blogs at https://jeroenjanssens.com.

This speaker also appears in: