JINXUAN WU
I“m currenty a software enigeer at the data engineering team at two sigma.

Sessions
11-08
10:10
40min
Mastering DataFrame Diffing Techniques
JINXUAN WU
This talk will explore various techniques for efficiently comparing and diffing data frames, an essential task in data analysis and data engineering workflows. We'll cover everything from simple pandas assertions and SQL joins to more advanced tools like duckdb and datacompy. Additionally, we'll dive into time series data frame diffing using asof joins. Finally, we'll discuss how to perform these operations at scale on platforms like Apache Spark, Snowflake and BigQuery. We will also discuss insights from commercial tools like Datafold’s Data-Diff and how these ideas can be implemented in both open-source and enterprise environments.
Central Park West