A Duck in the hand is worth two in the Cloud: Data preparation and analytics on your laptop with DuckDB
What if I told you that you could complete a JSON parse and extract task on your laptop before a distributed compute cluster even finishes booting up? DuckDB is a lightweight, in-process analytical database that runs on your laptop inside of Python and can wrangle large datasets efficiently, both from local and remote data sources. In this talk, we will show you how to query a dataset with DuckDB to extract, load and transform data right on your laptop. We'll then show you how to move your workloads to the Cloud, so you can run them at scale. By developing locally and pushing to the Cloud it's not only easy to develop, debug and iterate, but also makes it easy to quickly switch back and forth between workloads that do and don't require Cloud compute resources, cutting both cost and time.