Aziza Mirsaidova
Aziza is an Applied Scientist at Oracle in Generative and Responsible AI with more than 3 years of experience with ML/NLP technologies. Previously she worked in LLM evaluation and content moderation in AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master’s degree program in Artificial Intelligence from Northwestern University. Throughout her time at Northwestern, she worked as a ML Research Associate at Technological for Inclusive Learning and Teaching Lab (tiilt) in building multimodal conversation analysis applications called Blinc. She was a Data Science for Social Good Fellow at University of Washington’s eScience Institute during the summer of 2022. Aziza is interested in developing machine learning and Generative AI tools and systems to solve complex and social impact driven problems. Once she is done coding, she is either training for her next marathon race or hiking somewhere around PNW.

Sessions
Large Language Models (LLMs) generate contextual informative responses , but they also pose risks related to harmful outputs such as violent speech, threats, explicit content, and adversarial attacks. In this tutorial, we will focus on building a robust content moderation pipeline for LLM-generated text, designed to detect and mitigate harmful outputs in real-time. We will work through a hands-on project where participants will implement a content moderation system from scratch via two different ways. First is through using open source LLM models via Ollama and conducting various prompt engineering techniques. The second is fine tuning small open source LLMs on a content moderation specific datasets. It will also include identifying adversarial attacks, including jailbreaks, and applying both rule-based and machine learning approaches to filter inappropriate content.
This tutorial is aimed at AI engineers, researchers, and practitioners who are involved in deploying LLMs and are looking to implement moderation systems that prevent harmful content. A basic understanding of LLMs, NLP techniques and comfort in Python and Pytorch will be helpful. The GitHub repository contained code and datasets will be shared prior to the tutorial.