11-07, 10:55–11:35 (US/Eastern), Central Park West
“Have you tried <insert a DataFrame Library>? It will change everything!” ~ My enthusiastic colleagues every other week.
The DataFrame landscape has never been more exciting, from mature libraries like pandas, Dask, and PyArrow, to newer ones like Polars, DuckDB, and Ibis. Let’s get past the hype and evaluate contexts and domains where they excel—whether it’s efficient eager execution, single machine performance, or large-scale data processing.
Whether you're a data scientist optimizing trading algorithms, a researcher crunching geospatial data, or a developer building scalable data pipelines, we have plenty of DataFrame libraries to choose from: pandas, Polars, Dask, PyArrow, DuckDB, Modin, Vaex, etc. pandas is the most popular and familiar, while a library like Polars can significantly speed up single-machine workflows. How do you decide, though?
In this talk, we’ll understand the unique capabilities offered by each library and the specific use-cases they are tailored for. In addition to some objective third-party benchmarks, we’ll look at other important verticals like history and design philosophy, maturity and community support, domain-specific capabilities and compatibility, and how the libraries handle large datasets. We’ll also briefly discuss some libraries and initiatives that focus on compatibility between libraries, like Ibis for end users, and Narwhals and Data APIs consortium for library developers.
This talk is intended for data professionals who just want to use the best tool for their work. We hope to provide an overview of the current DataFrame landscape and help you decide to try out a new tool or stick with the one you already use.
No previous knowledge expected
Dharhas Pothina is the CTO at Quansight where he helps clients wrangle their data using the pydata stack. He also leads the development teams for the Nebari, Conda-Store and Ragna open source projects.
His background includes expertise in computational modeling, big data/high performance computing, visualization and geospatial analysis. Prior to his current position he worked for 15 years in state and federal research labs where he led large multi-disciplinary, multi-agency research projects.
He holds a PhD in Civil Engineering and an MS in Aerospace Engineering from the University of Texas at Austin and a BTech in Aerospace Engineering from the Indian Institute of Technology Madras.
Dharhas is passionate about enabling scientists and engineers with tools that let them scale as well as share their analyses, he loves woodworking, photography and teaching his daughters to love science.