Accelerating GPU Algorithms in pure Python PyData NYC 2024

Accelerating GPU Algorithms in pure Python
.ical

11-07, 10:10–10:50 (US/Eastern), Central Park West

GPU programming can be scary but doesn’t need to be. With the CUDA Core Libraries and CUDA Python object model, you have a friendly interface to get you started with GPU acceleration.

In this example-driven talk, we'll teach you how to launch work and manage memory. You'll learn how to use parallel algorithms, write your own kernels that leverage cooperative algorithms, and integrate seamlessly with accelerated libraries.

This talk will present the CUDA Core Compute Library (CCCL) along with how to utilize the parallel algorithms in your every day applications. CCCL has recently merged Python bindings to perform element wise transformations as well as blockwise scans and reductions. We are able to use these in conjunction with traditional CUDA libraries such as cuDNN and cuBLAS all from high level Python examples.

We start the talk with a general discussion of the CUDA model and how to manage accelerator devices as a core part of a Python application. We give three motiviating examples. First up is transforming images for an machine learning pipeline. Launching, executing, and streaming the transformations all from Python code. Next we discuss implementing various layers of a neural network such as softmax using blockwise operations. Finally we integrate with an accelerated neural network library that utilizes tuned low level kernels for convolutional nets.

To demonstrate the effectiveness of these Python interfaces, we end with a demonstration of implementing GPT-2 (ala llm.c) as a pure Python library from scratch. This implementation is nearly identical speed of the current llm.c implementation.

Come learn the joy of programming parallel algorithms from your friendly Python interface!

Prior Knowledge Expected –

No previous knowledge expected

Andy Terrel

I lead CUDA Python Product Management, working to make CUDA a Python native.

I received my Ph.D. from the University of Chicago in 2010, where Ibuilt domain-specific languages to generate high-performance code for physics simulations with the PETSc and FEniCS projects. After spending a brief time as a research professor at the University of Texas and Texas Advanced Computing Center, I have been a serial startup executive, including a founding team member of Anaconda.

I am a leader in the Python open data science community (PyData). A contributor to Python's scientific computing stack since 2006, I am most notably a co-creator of the popular Dask distributed computing framework, the Conda package manager, and the SymPy symbolic computing library. I was a founder of the NumFOCUS foundation. At NumFOCUS, I served as the president and director, leading the development of programs supporting open-source codes such as Pandas, NumPy, and Jupyter.

Bryce Lelbach

Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and software libraries. He is a Principal Architect at NVIDIA, where he leads HPC programming language efforts and drives the technical roadmap for NVIDIA’s HPC compilers and libraries. Bryce is passionate about C++ and is one of the leaders of the C++ community. He has served as chair of INCITS/PL22, the US standards committee for programming languages and the Standard C++ Library Evolution group. Bryce served as the program chair for the C++Now and CppCon conferences for many years. On the C++ Committee, he has personally worked on concurrency primitives, parallel algorithms, executors, and multidimensional arrays. He is one of the founding developers of the HPX parallel runtime system. Outside of work, Bryce is passionate about airplanes and watches.

Accelerating GPU Algorithms in pure Python .ical 11-07, 10:10–10:50 (US/Eastern), Central Park West

Accelerating GPU Algorithms in pure Python
.ical

11-07, 10:10–10:50 (US/Eastern), Central Park West