PyData NYC 2024

Use ARTKIT to Automate and Scale Up Your LLM Evaluation Process
11-07, 14:35–15:15 (US/Eastern), Central Park East

Since late 2022, Large Language Models (LLMs) have become an integral part of our daily lives, propelled by the rise of ChatGPT. Amid continuous media coverage, companies like Nvidia have experienced stock surges, and individuals have adjusted their work and study habits in response to these developments. However, many organizations remain hesitant to adopt LLMs and Generative AI (GenAI) at scale, primarily due to insufficient comprehensive experimentation and testing. There is a concern that deploying LLMs/GenAI without proper alignment to business needs could incur reputational or legal risks.

At Boston Consulting Group (BCG), we've been assisting numerous clients in navigating the LLM/GenAI landscape since late 2022. Unlike other AI applications, we've encountered challenges in finding suitable tools for testing LLM/GenAI systems. In response, we've developed an open-source Python package, ARTKIT, designed to facilitate automatic and scalable testing of LLMs/GenAI. In this talk, we'll share BCG's broader Responsible AI(https://www.bcg.com/capabilities/artificial-intelligence/responsible-ai) efforts and introduce ARTKIT(https://github.com/BCG-X-Official/artkit) to the PyData community.


  1. Responsible AI at BCG (10 minutes)
  2. Introduction and Tutorial to ARTKIT (20 minutes)
  3. Case Study Sharing (10 minutes)
  4. Q&A Session (10 minutes)

Prior Knowledge Expected

Previous knowledge expected

I'm a Lead Data Scientist at Boston Consulting Group. I specialize in building advanced analytics, Machine Learning and AI/GenAI solutions. I've served clients in financial services, retail, fashion, and auto industry in North America and China. I'm passionate about building products and creating values from data and technologies. I have led 3-20 people across functional team and delivered solutions in areas such as GenAI, computer vision, NLP, and personalization. I'm experienced in developing GenAI solution, evaluating and improving the solution with Responsible AI mindset.

Before BCG, I worked in Silicon Valley in a Fintech company as an engineer/data scientist.

Kevin Slater is a Senior Data Scientist at Boston Consulting Group. He has over five years experience delivering ML / AI solutions for clients across multiple industries, including aviation, manufacturing, and supply chain management. He is passionate about Responsible AI and has enjoyed helping teams design and implement their GenAI Testing & Evaluation strategies.

Prior to BCG, Kevin was a Quantitative Strategist at Goldman Sachs with a focus on inventory management. He has a degree in Physics and Mathematics from the University of Chicago.