PyData NYC 2024

Thomas J. Fan

Thomas J. Fan is a senior machine learning engineer at Union.ai and a maintainer of scikit-learn, an open-source machine learning library for Python. He led the development of scikit-learn's set_output API, which allows transformers to return pandas DataFrames. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He also maintains skorch, a neural network library that wraps PyTorch.

The speaker's profile picture

Sessions

11-07
15:20
40min
Pushing Cython to its Limits in Scikit-learn
Thomas J. Fan

scikit-learn is a machine-learning library for Python that uses NumPy and SciPy for numerical operations. Scikit-learn has its own compiled code for performance-critical computation written in C, C++, and Cython. The library primarily focuses on Cython for compiled code because it is easy to use and approachable. In this talk, we dive into many techniques scikit-learn employs to utilize Cython fully. We will cover features like using the C++ standard library within Cython, fused types, code generation with the Tempita engine, and OpenMP for parallelization.

Central Park West