PyData NYC 2024

Python and the AI Lakehouse
11-08, 10:55–11:35 (US/Eastern), Central Park West

The Lakehouse is a new open set of standards for separating data storage from data query engines, and it is becoming the dominant data platform for analytics. But the Lakehouse is not sufficient to augment applications with AI to make them more intelligent. Enterprises that have adopted the Lakehouse still have problems getting AI systems into production. The current Lakehouse architecture lacks native support for Python and real-time AI systems.


In this talk, we describe the capabilities that need to be added to Lakehouse to make it an AI Lakehouse that can support building and operating AI-enabled batch and real-time applications as well as LLM-powered applications. We will also talk about support for Python in the Lakehouse and work we have done on the Hopsworks Query Service that provides high-speed access to Lakehouse tables using Apache Arrow and temporal join support to create point-in-time correct training data from Lakehouse tables.


Prior Knowledge Expected

Previous knowledge expected