What we learned by converting a large codebase from Pandas to Polars PyData NYC 2024

What we learned by converting a large codebase from Pandas to Polars
.ical

11-08, 10:55–11:35 (US/Eastern), Music Box

In this talk, we'll share our experience of converting a substantial data processing codebase from Pandas to Polars. We’ll discuss the motivations behind the switch, the challenges faced during the transition, and the significant performance gains we observed. Attendees will gain insights into when and why Polars can be a superior choice for data processing tasks, especially in performance-critical applications.

As data volumes grow and processing demands increase, the limitations of traditional tools like Pandas become more apparent. While Pandas is a powerful and flexible library for data manipulation, it can struggle with performance and memory efficiency when dealing with large datasets. In response to these challenges, we decided to convert a large-scale data processing codebase from Pandas to Polars, a high-performance DataFrame library built in Rust.

This talk will provide a detailed account of our journey, covering:

Why We Made the Switch:
- The specific pain points we encountered with Pandas, including performance bottlenecks and memory constraints.
- An introduction to Polars and how its architecture addresses these issues.
The Conversion Process:
- Key differences between Pandas and Polars that required code adjustments
- How we managed the transition, including refactoring strategies and testing to ensure correctness.
Challenges and Solutions:
- The hurdles we faced, such as handling Polars’ lazy execution model and adapting to its API.
- Solutions and workarounds we developed to mitigate these challenges.
Performance Gains and Insights:
- Quantitative comparisons of execution times and memory usage before and after the conversion.
- Qualitative insights on how Polars improved our data workflows, including ease of use and scalability.
Takeaways for the Audience:
- Practical advice on when to consider Polars over Pandas.
- Tips for a smooth transition, including best practices and common pitfalls to avoid.

This talk is aimed at data engineers, data scientists, and Python developers who are familiar with Pandas and are looking for ways to improve the performance of their data processing pipelines. Attendees will leave with a clear understanding of the benefits and challenges of using Polars, and actionable insights on how to approach a similar migration in their own projects.

You can find the slides on: https://docs.google.com/presentation/d/1TGkjwPOQAS17YiQTQuYeUkJVKqJrIUSuVC34yTDr-Rk/edit?usp=sharing

Prior Knowledge Expected –

No previous knowledge expected

Thijs Nieuwdorp

Thijs Nieuwdorp is a data scientist at Xomnia and co-author of Python Polars: The Definitive Guide. With a background in Artificial Intelligence from Radboud University, he specializes in innovation, Responsible AI, MLOps, and clean code. At Alliander, Thijs leveraged Polars to optimize simulations of the Dutch power grid, reducing execution time and memory usage by a factor of four and saving massively in costs—contributing to a more reliable power supply.

This speaker also appears in:

Turning DataFrames into Pretty Pictures with Plotnine

Jeroen Janssens

Jeroen Janssens, PhD, is a polyglot data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. Jeroen is passionate about open source and sharing knowledge. He is the author of Data Science at the Command Line (O’Reilly, 2021) and is currently writing Python Polars: The Definitive Guide (O’Reilly, 2025). Every now and then he blogs at https://jeroenjanssens.com.

This speaker also appears in:

Turning DataFrames into Pretty Pictures with Plotnine

What we learned by converting a large codebase from Pandas to Polars .ical 11-08, 10:55–11:35 (US/Eastern), Music Box

What we learned by converting a large codebase from Pandas to Polars
.ical

11-08, 10:55–11:35 (US/Eastern), Music Box