Ibis: Don't Let the Engine Dictate the Interface
Tabular data is ubiquitous, and pandas has been the de facto tool in Python for
analyzing it. However, as data size scales, analysis using pandas may become
untenable. Luckily, modern analytical databases (like DuckDB) are able to
analyze this same tabular data, but perform orders-of-magnitude faster than
pandas, all while using less memory. Many of these systems only provide a SQL
interface though; something far different from pandas’ dataframe interface,
requiring a rewrite of your analysis code.
This talk will lay out the current database / data landscape as it relates to
the PyData stack, and explore how Ibis (an open-source, pure Python, dataframe
interface library) can help decouple interfaces from engines, to improve both performance
and portability. We'll examine other solutions for interacting with SQL from Python and
discuss some of their strengths and weaknesses.