11-07, 14:35–15:15 (US/Eastern), Winter Garden
Retrieval Augmented Generation apps (RAGs) offer a lot of value for answering questions based on validated data. However RAGs are often prone to hallucinate by using irrelevant information to construct their answer. Context filter guardrails eliminate this failure mode!
Typical guardrails only allow decisions based on the output, and have no impact on the intermediate steps of an LLM application. While it is commonly discussed to use guardrails for blocking unsafe or inappropriate output from reaching the end user, guardrails can also be leveraged to improve the internal processing of LLM apps.
Guardrails can use a variety of different implementations to judge relevance, including LLM-as-judge and smaller classification models.
Guardrails leverage feedback functions to evaluate the context relevance of each context chunk, and block irrelevant context from reaching the LLM for generation. Doing so eliminates hallucination from irrelevant context and reduces token usage by the generation step.
This workshop includes both a conceptual walkthrough of the problem (hallucination) and solution (guardrails), along with a hands-on experiment to compare how a RAG with guardrails can outperform a RAG without them. To perform the experiment, we'll create a RAG using plain python, and leverage OSS TruLens for guardrails and evaluation. We'll highlight specific instances where the RAG hallucinates, and instances where guardrails eliminate these hallucinations.
We will also experiment with different ways to measure context relevance (LLMs and small classification models), and the trade-offs for using each (latency v. quality).
Attendees will learn:
- Common causes of hallucination in RAG
- How context filter guardrails can solve a particular failure mode: hallucination from irrelevant context
- Different ways to measure context relevance
- How to add OSS TruLens context filter guardrails to a RAG built in plain python
- How to evaluate the effectiveness of the guardrails using the RAG triad to measure hallucination
Previous knowledge expected
Josh is a developer advocate for Snowflake, previously at TruEra (recently acquired by Snowflake). He is also a maintainer of open-source TruLens, a library to systematically track and evaluate LLM based applications.
Josh has delivered tech talks and workshops to thousands of developers at events including the Global AI Conference, NYC Dev Day 2023, LLMs and the Generative AI Revolution 2023, AI developer meetups and the AI Quality Workshop (both in live format and on-demand through Udemy).