PyData NYC 2024

Straightforward stream processing with CSP
11-07, 10:10–10:50 (US/Eastern), Central Park East

Writing real-time data pipelines can be a headache for developers. They are difficult to configure, hard to test in CI/CD frameworks, and often impossible to debug. Fortunately, it doesn’t have to be this way! This talk covers how CSP, a new open-source stream processing library, restores the Python developer experience when working with live data. CSP enables users to create powerful streaming applications that are easy to debug, test, profile, and rapidly prototype.


The real-time nature of streaming data presents distinct challenges to developers. Unlike static data, which is available at the beginning of a data pipeline, streaming data will only update at unpredictable times during live execution. This opens up a Pandora’s box of technical challenges such as coordinating asynchronous data feeds and ensuring sequential consistency when processing events.

There are several open-source Python libraries that help developers abstract away the complexities of streaming data and focus on their own data pipeline. These include pyflink, bytewax and quixstreams. While all these libraries are powerful tools for building robust streaming applications, they leave something to be desired in the developer experience. A differentiating feature of CSP is a seamless transition between real-time and simulated data, which enables efficient debugging and easy application-level testing.

In this talk, we look at a few areas where CSP can help improve the development process such as:

  1. Processing historical data
  2. Debugging
  3. Unit testing
  4. Profiling
  5. Running in Jupyter

The talk will be a mix of real-life examples demonstrating CSP’s ease of use and technical discussion on how we achieve it. While the focus is on developer experience, we also briefly discuss CSP’s runtime performance and how the usability benefits come without any performance tradeoff.

Data scientists and engineers who work with streaming data will benefit the most from this talk. However, the talk is designed to be accessible to anyone with intermediate-level knowledge of Python and data processing tools.


Prior Knowledge Expected

Previous knowledge expected

Adam Glustein is a Quantitative Developer at Point72 Asset Management in New York, USA. He is a contributor to CSP, a reactive stream processing library for both real-time and historical data.