PyData NYC 2024

Introducing BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking, and Hierarchical Stacking in Python
11-08, 11:40–12:20 (US/Eastern), Winter Garden

This talk introduces BayesBlend, a new, open-source Python package designed to simplify model blending using pseudo-Bayesian model averaging, stacking, and hierarchical stacking. BayesBlend enables users to improve out-of-sample predictive performance by blending predictions from multiple competing models, which is particularly useful in M-open settings where the true model is unknown. The talk will include practical examples from insurance loss modeling.


In this talk, I will present BayesBlend, a new open-source Python package that simplifies the process of blending predictions from multiple models to obtain better out-of-sample predictive performance compared to any single model in isolation. Despite its long history of success in both statistics and machine learning, model blending remains underused in practice due to the additional complexity it imposes on researchers and data scientists.

BayesBlend addresses this gap by providing a user-friendly interface for both estimating model weights and blending predictions using a variety of methods, including pseudo-Bayesian model averaging, stacking, and hierarchical stacking–all with just a few lines of code. Of these methods, hierarchical stacking is especially powerful, allowing for weights to vary according to covariates that are believed to differentially impact model performance.

The talk will be structured as follows:

  • Introduction and Motivation (0-5 minutes): Overview of model blending, its importance in predictive modeling, and challenges in M-open settings.
  • Overview of BayesBlend (5-15 minutes): Introduction to the BayesBlend package, including its core features and how it simplifies model blending.
  • Demonstration (15-25 minutes): Practical examples of using BayesBlend in Python, focusing on insurance loss modeling. This will include code snippets and live demonstrations of how to estimate model weights, blend predictions, and validate model performance.
  • Q&A and Discussion (25-30 minutes): Open floor for questions and discussion on the application and extension of BayesBlend in various domains.

Audience: This talk is aimed at data scientists, statisticians, and machine learning practitioners who are interested in improving their predictive models through model blending techniques. A basic familiarity with Bayesian modeling and Python programming is recommended.

Takeaway: Attendees will leave with an understanding of how to implement model blending using BayesBlend, enhancing their ability to build robust predictive models in uncertain modeling scenarios. They will also gain insight into the advantages of pseudo-Bayesian model averaging, stacking, and hierarchical stacking in practical applications.


Prior Knowledge Expected

Previous knowledge expected

See also:

Nathaniel Haines is a Bayesian, a data scientist, and a psychologist (mostly in that order). His work touches multiple fields including computational statistics, cognitive science, and actuarial science, and he is a core contributor to open-source Python and R libraries that make advanced Bayesian modeling techniques more accessible (BayesBlend and hBayesDM). He currently works as the Manager of Data Science Research at Ledger Investing, a Y Combinator backed fintech building a marketplace for casualty insurance-linked securities.