PyData NYC 2024

Aarti Jha is currently a Senior Data Scientist at Red Hat, Bengaluru, India, where she develops AI-driven solutions to streamline processes and reduce operational costs for internal initiatives. With over six years of experience, she has previously led the development of search and recommendation systems for e-pharma at her prior organisation.

In her free time, Aarti enjoys bringing her creative visions to life through sketching.

The Art of Compression: Crafting Insightful Summaries with LLMs

Abhishek Murthy

Abhishek Murthy is currently a Senior Principal Data Scientist at Schneider Electric (SE) in Boston, Massachusetts USA. He is passionate about sustainability, with a focus on climate change. To that end, he develops Machine Learning (ML) algorithms on sensor data that are critical for the sustainability commitments of the Industrial Automation and Energy Management businesses of SE. He is also a lecturer at Northeastern University and teaches machine learning algorithms for the Internet of Things.

Abhishek received his PhD in Computer Science from Stony Brook University, State University of New York and MS in Computer Science from University at Buffalo. His doctoral research, which was part of a National Science Foundation Expedition in Computing, entailed developing algorithms for automatically establishing the input-to-output stability of dynamical systems.

He led the Data Science Algorithms team at WHOOP before joining SE. He also worked at Signify, formerly called Philips Lighting, as a Senior Data Scientist and led research on IoT applications for smart buildings. Abhishek has served on several conference review committees and NSF panels. His research includes several publications and research articles with more than 195 citations. He has been awarded 15 patents and has more than 45 applications pending.

Adopting Open-Source Tools for Time Series Forecasting: Opportunities and Pitfalls

Adam Glustein

Adam Glustein is a Quantitative Developer at Point72 Asset Management in New York, USA. He is a contributor to CSP, a reactive stream processing library for both real-time and historical data.

Straightforward stream processing with CSP

Aditya Kelvianto Sidharta

Aditya is a Principal Data Scientist in Capital One, building an automated observability platform that identifies application failure within the system. Throughout his career in the industry, he has an extensive experience in applying causal inference techniques to solve different use cases. This includes building a personalized promo allocation system, surge pricing platform, and root cause identification. Additionally, Aditya obtained his M.S in Computer Science from Columbia University with a specialization in Machine Learning. He actively contributes back to the Data Science community by providing mentorship to students interested in entering the field. He is also involved in pro-bono work, providing technical consultancy to non-profit organizations.

Holistic root cause analysis of software breakages through structural causal modeling

Alexandre Andorra

⚾ Senior Applied Scientist @ Miami Marlins
🎙️ Creator @ LearnBayesStats Podcast
📊 Cofounder @ PyMC Labs
👨‍🏫 Teacher @ Intuitive Bayes

Mastering Gaussian Processes with PyMC
PyMC-Marketing & Customer Analytics, with Will Dean
Saving Sharks... with Python, Causal Inference and Bayesian Stats!

Alexy Khrabrov

Alexy Khrabrov is the AI Community Architect at Neo4j. He is a cofounder of the AI Alliance and the founder and chair of Open-Source Science at NumFOCUS. Alexy was the founding chair of the Generative AI Commons at the Linux Foundation for AI and Data, where he is now a framework co-chair. Dr. Khrabrov is a founder and organizer of Bay Area AI, the most established AI meetup in the Bay Area continuously run since 2015, and Scale By the Bay, and independent developer conference run since 2013. He was. Chief Scientist at Nitro, an Australian public company developing Smart Documents, a software engineer at Amazon, and a cofounder and engineer at several Silicon Valley startups. He blogs at chiefscientist.org.

Making OSS AI Real with Knowledge Graphs

Allan Butler

My name is Allan Butler. I do/am a Data Scientist at H-E-B. Previously I was a data scientist in the energy sector as well as the professional sports world where I worked on a wide variety of problems related to demand forecasting, pricing optimization, and visual story telling.

Why Your Machine Learning Model's Probabilities Are Lying to You: A Guide to Probability Calibration

Allison Wang

Allison Wang is a Software Engineer at Databricks and an Apache Spark Committer, specializing in Spark SQL and PySpark. She’s passionate about bridging Python with the big data ecosystem. Allison holds a bachelor’s degree in Computer Science from Carnegie Mellon University.

Faster PySpark with Apache Arrow

Amanda Silver

Amanda Silver is the Corporate VP and Head of Product for Microsoft's Developer Division, whose mission is to empower developers and their teams to achieve more. It includes our developer tools, runtimes, and services; the developer experience for Azure; our Azure Application Development PaaS and serverless offerings; Azure DevOps; the Open Source Programs Office, and Microsoft’s first-party Engineering Systems team.

Her focus on customer-driven engineering with a tight digital feedback loop has fueled culture change at Microsoft. She championed customer-focused innovations like Visual Studio Live Share and IntelliCode, which have transformed how developers and teams build and collaborate worldwide. She’s also played a major part in the introduction of Microsoft’s open source products, having led the TypeScript and Visual Studio Code launches, and in the acquisitions of Xamarin and GitHub. Unleashing the creativity of all developers is her personal mission.

Keynote: #HappyCoding! Building and scaling AI applications for Data Scientists and Developers

Andrea Gao

I'm a Lead Data Scientist at Boston Consulting Group. I specialize in building advanced analytics, Machine Learning and AI/GenAI solutions. I've served clients in financial services, retail, fashion, and auto industry in North America and China. I'm passionate about building products and creating values from data and technologies. I have led 3-20 people across functional team and delivered solutions in areas such as GenAI, computer vision, NLP, and personalization. I'm experienced in developing GenAI solution, evaluating and improving the solution with Responsible AI mindset.

Before BCG, I worked in Silicon Valley in a Fintech company as an engineer/data scientist.

Use ARTKIT to Automate and Scale Up Your LLM Evaluation Process

Andy Terrel

I lead CUDA Python Product Management, working to make CUDA a Python native.

I received my Ph.D. from the University of Chicago in 2010, where Ibuilt domain-specific languages to generate high-performance code for physics simulations with the PETSc and FEniCS projects. After spending a brief time as a research professor at the University of Texas and Texas Advanced Computing Center, I have been a serial startup executive, including a founding team member of Anaconda.

I am a leader in the Python open data science community (PyData). A contributor to Python's scientific computing stack since 2006, I am most notably a co-creator of the popular Dask distributed computing framework, the Conda package manager, and the SymPy symbolic computing library. I was a founder of the NumFOCUS foundation. At NumFOCUS, I served as the president and director, leading the development of programs supporting open-source codes such as Pandas, NumPy, and Jupyter.

Accelerating GPU Algorithms in pure Python

Angad Sandhu

MSE student at Johns Hopkins University. Working on advanced multimodal models at Johns Hopkins Medicine, building tools for the best doctors in the world.

Temporal Analysis on Topics Utilizing Word2Vec

Anjali Datta

Anjali is a postdoc at Stanford Medicine working on MRI processing to identify neurosurgical targets. She also has a PhD in Electrical Engineering from Stanford, during which she developed MRI acquisition and reconstruction methods. Medical imaging is of course a field where ML is taking over, and Anjali is now interested in the applications of deep learning to MRI and other signal processing.

Building machine learning pipelines that scale: a case study using Ibis, IbisML, and dlt

Art Anderson

Director of Developer Advocacy
www.linkedin.com/in/artdanderson

Art is a passionate tech enthusiast, builder, and lifelong learner with a knack for simplifying complex concepts through real-world applications. With a diverse background spanning tax and accounting software, convolutional neural networks in machine vision, and NoSQL databases, Art excels in teaching and demonstrating how systems connect. Whether tinkering with tech or creating innovative solutions, Art’s unique perspective bridges the gap between understanding and application.

Vector Databases Demystified: Discover how they work
Unlocking the Power of Hybrid Search: A Deep Dive into Python-Powered Precision and Scalability

Astha Puri

Astha is a Senior Data Scientist at a top Fortune 10 company, where she designs the recommendation engine for digital platforms to help customers find the right products and patients find the right health services and support. She also leads AI initiatives including generative AI and oversees the entire search and chat portfolio across the app and web.

With nine years of experience in data science at tech companies like Oracle and Twilio, Astha is now applying her expertise to healthcare. She has a background in healing and alternative therapies and is researching how to integrate AI, health, and healing. Astha is passionate about empowering healthcare professionals with the knowledge and tools they need to excel. She recently published book chapters in collaboration with the Indian Fertility Society, focusing on AI in counseling in Assisted Reproduction Technology. As one of the 19 members of the national core team in India, she is leading the technology and AI platform for Counselor Empowerment Program (CEP) with a mission to provide data driven AI systems to support counselors and patients in IVF centers across the country.

Her work is unique in that it merges traditional healing therapies and psychotherapy with artificial intelligence, using Python-based models built on anonymized data combined with technology like OpenAI, Gemini etc. using architectures like RAG to ensure contextual and reliable AI. This approach allows for hyper-personalized treatment plans for individuals while maintaining data privacy and accountability. In the context of IVF patients, this method aims to prevent additional burdens like depression, ensuring they receive comprehensive support during their challenging journey.

In addition to this Astha is a speaker, author, and mentor. She helps women advance in data science as a founding board member and vice president of Women Who Do Data. Astha has been recognized among the top 250 women globally in AI and ML and has received the Global Recognition Award, and Excellence in Applied Research Award.

Astha has a master's in Analytics from University of Minnesota and a B.Tech in Electronics and Communication from VIT University.

Building Women in Data Communities: A Journey of Empowerment

Avi Levin

Avi is an AI tech lead at Citi Innovation Lab.
He holds a master's in computer science with a thesis on interpretable machine learning.

Explaining Machine Learning Models with LLMs

Avik Basu

Avik is a seasoned data scientist, having worked in multiple different domains of machine learning. He loves coding in Python, and writing elegant and scalable code.

Reproducible work environments for data scientists using Nix

Aziza Mirsaidova

Aziza is an Applied Scientist at Oracle in Generative and Responsible AI with more than 3 years of experience with ML/NLP technologies. Previously she worked in LLM evaluation and content moderation in AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master’s degree program in Artificial Intelligence from Northwestern University. Throughout her time at Northwestern, she worked as a ML Research Associate at Technological for Inclusive Learning and Teaching Lab (tiilt) in building multimodal conversation analysis applications called Blinc. She was a Data Science for Social Good Fellow at University of Washington’s eScience Institute during the summer of 2022. Aziza is interested in developing machine learning and Generative AI tools and systems to solve complex and social impact driven problems. Once she is done coding, she is either training for her next marathon race or hiking somewhere around PNW.

Responsible AI: Building Moderation Pipelines for Harmful and Adversarial Content

Benjamin Zaitlen

Keynote: Density! The not-so-secret trend driving the future of Data Analytics

Bryce Lelbach

Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and software libraries. He is a Principal Architect at NVIDIA, where he leads HPC programming language efforts and drives the technical roadmap for NVIDIA’s HPC compilers and libraries. Bryce is passionate about C++ and is one of the leaders of the C++ community. He has served as chair of INCITS/PL22, the US standards committee for programming languages and the Standard C++ Library Evolution group. Bryce served as the program chair for the C++Now and CppCon conferences for many years. On the C++ Committee, he has personally worked on concurrency primitives, parallel algorithms, executors, and multidimensional arrays. He is one of the founding developers of the HPX parallel runtime system. Outside of work, Bryce is passionate about airplanes and watches.

Accelerating GPU Algorithms in pure Python

Christian Luhmann

Dr. Christian Luhmann is Chief Operating Officer at PyMC Labs, a data science consultancy specializing in solving complex data science problems for businesses. In this role, Christian oversees the day-to-day global business of the firm and ensures that the organization has the necessary coordination, communication, and operating processes. Prior to assuming the role of COO, Christian acted as Project Manager, managing PyMC Labs' global client relationships, helping deliver innovative solutions to meet clients’ businesses' objectives, and driving business development efforts.

Christian began his career in academia, earning a BS degree in Computer Science and a PhD in Psychology. He spent 16 years as a professor, focusing on research in behavioral economics and machine learning and teaching statistics and data science. This experience instilled a strong desire to help others learn from their data.

PyMC-Marketing: Customer and Marketing Analytics the Easy Way

Christopher J Fonnesbeck

Mastering Gaussian Processes with PyMC

Chuxin Liu

I am a senior associate at JPMorgan and I hold a PhD in Economics. I am also the NYC chapter lead of AICamp, ambassador of Women in Data Science (WiDS) and Women Techmaker (WTM).

Feel free to connect me on LinkedIn: https://www.linkedin.com/in/chuxin-liu/

Building Women in Data Communities: A Journey of Empowerment

Deepyaman Datta

Deepyaman is a maintainer of Kedro, an open-source Python framework for building production-ready data science pipelines. He is passionate about building and contributing to the broader open-source data ecosystem.

Previously, Deepyaman was a software engineer at Voltron Data. Before their acquisition by Voltron Data, he was a Founding Machine Learning Engineer at Claypot AI, working on their real-time feature engineering platform. Prior to that, he led data engineering teams and asset development across a range of industries at QuantumBlack, AI by McKinsey.

Building machine learning pipelines that scale: a case study using Ibis, IbisML, and dlt

Dharhas Pothina

Dharhas Pothina is the CTO at Quansight where he helps clients wrangle their data using the pydata stack. He also leads the development teams for the Nebari, Conda-Store and Ragna open source projects.

His background includes expertise in computational modeling, big data/high performance computing, visualization and geospatial analysis. Prior to his current position he worked for 15 years in state and federal research labs where he led large multi-disciplinary, multi-agency research projects.

He holds a PhD in Civil Engineering and an MS in Aerospace Engineering from the University of Texas at Austin and a BTech in Aerospace Engineering from the Indian Institute of Technology Madras.

Dharhas is passionate about enabling scientists and engineers with tools that let them scale as well as share their analyses, he loves woodworking, photography and teaching his daughters to love science.

How many dataframe libraries do you need to change a lightbulb?

Dhavide Aruliah

Dhavide Aruliah has been teaching & mentoring both in academia and in industry for three decades. His career has grown around bringing learners from where they are to where they need to be mathematically & computationally. He was a university professor (Applied Mathematics & Computer Science) at Ontario Tech University before moving to industry where he oversaw training programs supporting the PyData stack at Anaconda Inc. and later at Quansight LLC. He has taught over 40 undergraduate- & graduate-level courses at five Canadian universities as well as numerous Software Carpentry, SciPy, & PyData tutorial workshops. Here are some examples of his past tutorials & talks:

Ten Years of Teaching with Jupyter: Reflections from Industry & Academia (JupyterCon 2023)
Deep Learning from scratch with PyTorch (SciPy 2020)
An Introduction to Sentiment Analysis of Textual Data (PyData Austin 2019)
Learn how to Make Life Easier with Anaconda (PyData DC 2016)

Using NASA EarthData Cloud & Python to Model Climate Risks

Dr. Rebecca Bilbro

Dr. Rebecca Bilbro is an applied AI/ML engineer and one of the pioneers of the data science revolution of the early 2010’s. Co-author of Applied Text Analysis with Python (O'Reilly 2017) and Apache Hudi: The Definitive Guide (O'Reilly 2024), Rebecca has worked across academia, industry, and the public sector. She is co-creator of Yellowbrick, a Python library that integrates the scikit-learn and matplotlib APIs to support more convenient model diagnostics and steering. As co-founder and CTO of Rotational Labs, Rebecca is motivated by a desire to unite the data science and engineering communities. She and her team help other companies leverage in-house domain expertise and data to build and deploy LLMs, data products, and services. Rebecca earned her doctorate from the University of Illinois, Urbana-Champaign, where her research centered on domain-specific languages within engineering.

Data Secrets from a Platform Engineer

Dr. Sabrina Hsueh

Dr. Sabrina Hsueh is an accomplished Ethical AI and External Innovation Lead at Pfizer, where she spearheads the development and implementation of AI strategies. With a focus on AI observability, guardrails, and governance, Dr. Hsueh champions equitable AI in healthcare across enterprises such as IBM and Pfizer, as well as start-ups in the A16z and Khosla Ventures portfolios. Dr. Hsueh leads industrial AI standard-setting initiatives with public-private partnerships and professional societies. She also serves on the ACM Practitioners’ Board, co-chairs the Women in AMIA steering committee and co-founded an international work group to address global women’s health gaps and drive diversity and inclusion in informatics. Her dedication to leveraging informatics and AI evaluation has led to over 80 technical publications, patents, and co-edited special issues in reputable journals and textbooks (e.g., Personal Health Informatics by Springer Nature). Dr. Hsueh's commitment to ethical AI and her leadership in AI standardization make her a sought-after speaker and consultant, winning awards and nominations across the fields as a Responsible AI leader in HCLS, paving the way for a more trustworthy adoption of AI technologies in a dynamic regulatory landscape.

Keynote: Trustworthy AI in Healthcare and Life Sciences: Leading AI and GenAI Innovation in a Dynamic Regulatory Landscape

Ekin Tiras

Ekin Tiras is a Senior Software Developer at SAP, specialising in observability and designing automated telemetry analysis systems with machine learning. With a focus on improving observability solutions, Ekin designs and implements models to drive more insightful analytics. Previously, he worked at a major German public broadcaster, where he contributed to a platform that leverages machine learning to extract and categorize metadata from video files. His professional work and personal interest in anomaly detection have also led him to become a maintainer of the open-source library PyNomaly in his spare time.

Interpretable Anomaly Detection for Numerical Data in Python Using PyNomaly

Ethan Cole

Ethan is a Principal Machine Learning Engineer at Capital One in the observability space. With a Master's in Data Science from Northwestern University, he has over five years of experience building data pipelines and ML solutions for automated anomaly detection and customer impact analysis. His main areas of expertise are deep learning, time series forecasting, and natural language processing.

Holistic root cause analysis of software breakages through structural causal modeling

Faizan Wajid

Faizan Wajid is a PhD Candidate at the University of Maryland, Computer Science department where his research focus is on analyzing people's full-day respiration patterns in non-clinical settings. He is also the Senior Data Scientist at ExploreDigits, Inc. where he is developing NLP models to analyze data pertaining to nursing homes and health policy.

Temporal Analysis on Topics Utilizing Word2Vec

Gil Forsyth

Ibis: Don't Let the Engine Dictate the Interface

Guen Prawiroatmodjo

Guen is a software engineer at MotherDuck on the Ecosystems team. Previously, she was a Sr. Quantum Measurement Engineer at Microsoft. She's spent her career in software engineering, data engineering and data science with Python in the context of scientific data acquisition, analysis and computation for experimental physics and biotech. She has given introductory talks and workshops on quantum computing with Python at various conferences, hackathons and events.

A Duck in the hand is worth two in the Cloud: Data preparation and analytics on your laptop with DuckDB

Isabel Zimmerman

Isabel is a software engineer at Posit, PBC, where she builds Python-based open source data science tools. She is a current member of the triage team and an Emeritus Editor in Chief at pyOpenSci, an organization that supports scientific Python tools by offering peer reviews of packages. When not thinking about computers, she enjoy reading and training dogs.

End-to-end data science with the Positron IDE

JINXUAN WU

I“m currenty a software enigeer at the data engineering team at two sigma.

Mastering DataFrame Diffing Techniques

Jacob Matson

developer advocate at MotherDuck

A Duck in the hand is worth two in the Cloud: Data preparation and analytics on your laptop with DuckDB

Jacob Tomlinson

Jacob Tomlinson is a senior Python software engineer at NVIDIA with a focus on deployment tooling for distributed systems. His work involves maintaining open source projects including RAPIDS and Dask. RAPIDS is a suite of GPU accelerated open source Python tools which mimic APIs from the PyData stack including those of Numpy, Pandas and SciKit-Learn. Dask provides advanced parallelism for analytics with out-of-core computation, lazy evaluation and distributed execution of the PyData stack. He also tinkers with the open source Kubernetes Python framework kr8s in his spare time. Jacob volunteers with the local tech community group Tech Exeter and lives in Exeter, UK.

GPU Accelerated Python

James Munro

James Munro is Head of ArcticDB at Man Group. ArcticDB is a high-performance data-frame database that is optimised for time-series data, data-science workflows and scales to petabytes of data and thousands of simultaneous users.

James was previously CTO at Man AHL between 2018 and 2023. He joined Man Group in 2011 as a quant developer and has worked with Man AHL’s portfolio management, FX, commodities, fixed income, equities and volatility teams.

James holds a PhD in Theoretical Physics from University College London.

ArcticDB, the OLAP antidote

Javier

Python and the AI Lakehouse

Jeroen Janssens

Jeroen Janssens, PhD, is a polyglot data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. Jeroen is passionate about open source and sharing knowledge. He is the author of Data Science at the Command Line (O’Reilly, 2021) and is currently writing Python Polars: The Definitive Guide (O’Reilly, 2025). Every now and then he blogs at https://jeroenjanssens.com.

Turning DataFrames into Pretty Pictures with Plotnine
What we learned by converting a large codebase from Pandas to Polars

Jim Dowling

Jim Dowling is CEO of Hopsworks. He is an established researcher in systems software for AI, having been an Associate Professor at KTH Stockholm and Trinity College Dublin. He is an author of a book on building AI systems at O'Reilly. He organizes the annual feature store summit and co-organizes PyData Stockholm.

Building Python based AI Systems with LLMs

Jim Kitchen

Jim Kitchen is the lead engineer for the team that built the Anaconda Toolbox and Anaconda Code add-ins for Excel. His work at Anaconda has included consulting with large financial institutions to convert Excel-based stress models into production-ready Python code. His interest in connecting Excel and Python originally began at Xerox where he rewrote several Excel-based tools into a comprehensive Python analysis suite for printer logs. Jim is based in Austin Texas.

Advanced Excel Analytics: Python Integration Workshop

Jonathan Starr

Jonathan Starr is the program manager of the Open Source Science Initiative out of NumFOCUS. He contributes to several open source projects and start-ups developing technologies enabling open science practices through novel infrastructure and incentive design. He is also president of The Science Commons Initiative and The Science Coordination Infrastructure and Operating Systems Collaborative, both 501c3s at the intersections of open source, open science, open education, and public engagement.

The Graph Database of Everything Open: Navigating the Open Source and Open Science Ecosystem with MOSS and SOL

Josh Reini

Josh is a developer advocate for Snowflake, previously at TruEra (recently acquired by Snowflake). He is also a maintainer of open-source TruLens, a library to systematically track and evaluate LLM based applications.

Josh has delivered tech talks and workshops to thousands of developers at events including the Global AI Conference, NYC Dev Day 2023, LLMs and the Generative AI Revolution 2023, AI developer meetups and the AI Quality Workshop (both in live format and on-demand through Udemy).

Eliminate Hallucinations with Context Filter Guardrails

Katrina Riehl

Dr. Katrina Riehl is a Principal Technical Product Manager at NVIDIA supporting CUDA and Python. For over two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data science, and visualization. Most notably, she has helped lead initiatives at the University of Texas Austin Applied Research Laboratory, Anaconda, Apple, Expedia Group, Cloudflare, and Snowflake. She is an active volunteer in the Python open-source scientific software community and continues to serve on the Advisory Council for NumFOCUS.

GPU Accelerated Python

Kaushik Srinivasan

Kaushik Srinivasan is a geospatial software engineer at Bloomberg and a nascent open source contributor

Daunting to Doable: How Institutions can encourage Open Source Contributions

Kevin Slater

Kevin Slater is a Senior Data Scientist at Boston Consulting Group. He has over five years experience delivering ML / AI solutions for clients across multiple industries, including aviation, manufacturing, and supply chain management. He is passionate about Responsible AI and has enjoyed helping teams design and implement their GenAI Testing & Evaluation strategies.

Prior to BCG, Kevin was a Quantitative Strategist at Goldman Sachs with a focus on inventory management. He has a degree in Physics and Mathematics from the University of Chicago.

Use ARTKIT to Automate and Scale Up Your LLM Evaluation Process

Lawrence Gray, PhD

An AI leader with a mission, Dr. Lawrence Gray brings a wealth of experience from the consulting, accounting, and geospatial intelligence sectors to his role as Director of Engineering at KUNGFU.AI. With a Ph.D. from Johns Hopkins in Cellular and Molecular Physiology and a career spanning over 10 years, Dr. Gray has consistently pushed the boundaries of what's possible with machine learning and data-driven decision-making. As an Adjunct Professor at Georgetown University, he shapes the next generation of data practitioners, teaching courses in Data Analytics and Data-Driven Decision Making. Dr. Gray's commitment to open science is evident in his work with NumFocus and as a core maintainer of Yellowbrick. Currently authoring two books for Manning Publications, he continues to share his expertise with the wider tech community. Dr. Gray is particularly excited by the potential of AI to drive new discoveries in mental health, fueled by a passion to improve the lives of those affected by mental illness. This enthusiasm reflects his belief in AI as a powerful tool for groundbreaking research and positive societal impact.

Keynote: Open Source, Open Roads: Mapping the Human Element in Tech

Liqiang Lu

PyTorch Meetup @ PyData NYC

Lucy Herr

I am a data scientist graduating from U.C. Berkeley’s Masters in Information & Data Science (MIDS) program this fall. Previously, I have worked at Katch, an AI content analytics platform, where I developed techniques to process and engineer the proprietary movie content features deployed in our recommendation models. Before transitioning into data science, I worked in educational consulting and research for over a decade and earned an M.Ed. in Learning Sciences from the University of Washington. At the UW Center for Evaluation & Research for STEM Equity (CERSE), my research concentrated on promoting the advancement of STEM students from underrepresented groups in higher education.

Participating in data science communities like WiDS has been instrumental to my ongoing learning and professional development in the field of AI/ML, and I am honored to join my fellow WiDS NYC members at PyData in highlighting opportunities and impacts in these pathways for empowerment.

Building Women in Data Communities: A Journey of Empowerment

Matt Harrison

Matt Harrison trains the largest companies in the world to leverage Python and apply data science. When he is not providing corporate training, he can be found writing books or out in the mountains. He is the author of the best-selling books Effective Pandas, Effective XGBoost, and more.

An Introduction to Polars

Michael Chow

I’m a data science tool builder at Posit, where I work on open source tools for data analysis. Previously, I worked as a consultant building out a data team for Caltrans (and love all things GTFS).

I received a Ph.D. in Cognitive Psychology from Princeton University, and am interested in what drives expert data science performance.

Turning DataFrames into Pretty Pictures with Plotnine

Mike McCarty

Mike is a Senior Software Engineering Manager at NVIDIA, leading teams working on RAPIDS Cloud and HPC deployments, build infrastructure, and PyData projects. Mike is a former member of the advisory counsel at NumFOCUS and Prefect. He holds two BS degrees in Computer Science and Physics, and has over 20 years of experience in astronomy, computational sciences, data science, machine learning, and enterprise products.

GPU Accelerated Python

Nathaniel Haines

Nathaniel Haines is a Bayesian, a data scientist, and a psychologist (mostly in that order). His work touches multiple fields including computational statistics, cognitive science, and actuarial science, and he is a core contributor to open-source Python and R libraries that make advanced Bayesian modeling techniques more accessible (BayesBlend and hBayesDM). He currently works as the Manager of Data Science Research at Ledger Investing, a Y Combinator backed fintech building a marketplace for casualty insurance-linked securities.

Introducing BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking, and Hierarchical Stacking in Python

Nick Tchayka

Nick, Chief Meme Occultist at The Agile Monkeys, is a software engineer specializing in AI, large language models, and functional programming. Creator of NeoHaskell, and one of the core contributors of Booster Framework, Nick focuses on developer-friendly tools and technologies with a people-first approach, contributing to AI and LLM advancements.

Building Production-Ready AI Systems in 90 Minutes

Niki Karanikola

As a Machine Learning Engineer, I've dedicated the last 1.5 years to developing AI solutions for the environmental services sector at Veolia North America. I'm also a strong advocate for increasing representation in the field, and I've served as a Women in Data Science ambassador for two years, organizing various events to highlight the contributions of women in data science.

Feel free to reach out on LinkedIn: https://www.linkedin.com/in/niki-karanikola/

Building Women in Data Communities: A Journey of Empowerment

Niranjan Ganesan

I consider myself fortunate to have been born in the 90s, the era which connects “everything but tech” with “everything tech”. To be sure my brother and I did not miss the dotcom wave, my parents invested in a personal computer at the start of the millennium. Little did I know that this would shape my career, two decades later.

As a teen, my tech career had officially begun when I saw my grandparents recording their monthly expenses in a notebook and I developed an expensing app called “Cash +” to address it. I had it on a floppy disk (the most accessible tech at the time) to give their user experience (UX) flexibility when they travel. It was easier to carry the disk than the physical notebook.

Once I realized I had the acumen for tech, I decided to pursue engineering and then an MBA in highly ranked universities. My challenging MBA program took me to 3 culturally diverse cities each semester - Singapore, Sydney & Dubai where I had to learn to be frugal, fluid, and adaptable and all these qualities helped my career post MBA.

At graduation, I set a goal for myself - to explore different management functions to select the path in tech that I would thrive in. Looking back at my career experimentation, I’m glad that I was able to build & and launch products at Jio, take products & and services to the market through strategic B2B partnerships at Redington, and increase brand awareness through online marketing at Zero&One.

Last year, I decided to segue into a more technical path in Data Science and Machine Learning. I’m currently building my network in the US and proactively connecting with tech leaders here in the Data Science and Machine Learning fields.

Predicting Movie Success: Analyzing the Impact of Music, Posters, and Trailer Tonality Using Machine Learning

Pietro Peterlongo

Data Scientist who loves programming (languages). People driven. Trying to focus.
Passionate about Tech Communities and Open Source.

After having worked for a few years putting Machine Learning in production for a Supply Chain Planning and Optimization tool, I took a break, went to a few conferences, joined the Python Milano organizers and help launch PyData Milan chapter. I also spent a period remotely working for batch at Recurse Center a special place (retreat for programmers to dramatically improve their skills) in New York which I plan to visit in real life next time I pass by the city.

I am now happily employed by AgileLab, an Italian born company with the purpose of "elevate the data engineering game empowering companies to shape their future around data." We have a public handbook and a self management system called Holocracy. Stop me and say hi if you are curious about any of the above. Or even if you are not. :)

Domain-driven Data Science

Protonu Basu

PyTorch Meetup @ PyData NYC

Rami Krispin

Rami Krispin is a senior data science and engineering manager, Docker Captain, and LinkedIn Learning instructor. He mainly focuses on time series analysis, forecasting, and MLOps applications.

He is passionate about open source, working with data, machine learning, and putting stuff into production. He creates content about MLOps and recently released a course - Data Pipeline Automation with GitHub Actions Using R and Python, on LinkedIn Learning, and multiple tutorials about Docker for data science.

He is the author of Hands-On Time Series Analysis with R and is currently working on my next book, Applied Time Series Analysis and Forecasting, which focuses on forecasting at scale with Python.

Deploy and Monitor ML Pipelines with Python, Docker and GitHub Actions

Ravi Kumar

Data Science @ Walmart, Ex-Bank of America

An Introduction to Retrieval Augmented Generation

Rhythm Patel

Rhythm Patel is a software engineer at Bloomberg. He is a one of the leaders of Bloomberg's Python Guild, which is dedicated to aiding Python engineers, fostering innovation, creating and maintaining Python packages, as well as acting as a bridge to the wider Python community. Rhythm has spoken at PyCon DE & PyData Berlin 2024, PyCon US 2024, PyCon Italy 2024, PyData London 2024, and internal events. When he’s not working, you can find him playing football or tennis, traveling and hiking, or volunteering at London’s Royal Parks and London Zoo.

No More Raw SQL with SQLAlchemy and ORMs

Rick Ratzel

Rick Ratzel is an engineering manager for RAPIDS cuGraph - a library of GPU-accelerated graph algorithms. Rick joined NVIDIA in January 2019, bringing several years of experience as a technical lead for teams in industries that include test and measurement, electronic design automation, and scientific computing.

GPU Accelerated NetworkX: Run Large-Scale Graph Analytics Using The Most Popular Graph Analytics Library Available

Ritchie Vink

Ritchie Vink is the Author of the Polars DataFrame library and Founder/CEO of the Polars company. Before Polars,, he was working for 5 years in machine learning and software engineering, after he made the switch from Civil Engineering.

Polars GPU acceleration

Rohit Tripathy

Graph neural networks for biomedical insights - revealing markers of Alzheimer's disease from multi-modal molecular data

Roni Kobrosly

Roni is a former academic epidemiology researcher who has spent a decade employing causal modeling around the population-level effects of harmful environmental exposures. Since leaving the academic world, he's been loving his second life in the tech industry as a data scientist, and is currently Director of Data Science at Capital One. He loves contributing in the open-source community, mentoring junior data folks, and explaining the magic of data analysis and modeling to non-technical audiences.

Holistic root cause analysis of software breakages through structural causal modeling

Saba Nejad

Saba Nejad is a Data Engineer at Point72 working mostly with alternative data within the energy and industrials sector. She is broadly interested in using mathematics and programming to gain insight from real world data. Prior to joining Point72, she was studying at MIT where she was doing research at the Institute for Data, Systems, and Society. She was previously a Product Manager at Quantopian.

How to Deploy Machine Learning Inference Code to Production: A Case Study in Optimizing Code from a Latency and Memory Perspective

Santosh Borse

I work as a Senior Engineer, watsonx Data Engineering at IBM Research.

I enjoy learning new things, debugging and solving technical challenges. 19 Years of software development experience in technologies ranging from C/C++, hand held device programming to Big data analytics to AI

I am passionate about the playing Cricket, and you will find me on the ground most of the weekend mornings.

Preparing Data for LLM pretraining using open source Data Prep Kit
Preparing data for LLM training with Data Prep Kit

Sheetal Borar

Sheetal has six years of experience in data science and machine learning, with a career that spans Asia and Europe and active engagement with the global data science community. She worked at Amazon as an Applied Scientist in London, focusing on personalization, and as a Machine Learning Engineer at JP Morgan Chase in Hong Kong. She holds a master’s degree in Data Science and AI from a dual degree program in the Netherlands and Finland, during which she published papers at top-tier conferences. Currently, she leads a paper reading group in the Northeast, facilitating discussions with fellow data professionals.

Sheetal is deeply passionate about fostering and growing women-focused communities in tech. As a WiDS (Women in Data Science) ambassador, she actively supports initiatives that empower women in the field. Her dedication to community impact was recognized with the Social Impact Award in Germany.

Building Women in Data Communities: A Journey of Empowerment

Shefali Shrivastava

Shefali is completing her MS in Applied Statistics at Columbia University and brings over 3 years of data science and analytics experience across education, consulting, and AdTech. Her notable work includes leading data-driven initiatives that impacted 90,000 schools at BCG, optimizing ad performance at Media.net (one of the top 5 largest AdTech companies worldwide by market cap), and developing cloud-based analytics solutions for U.S. non-profit institutions.

Building Women in Data Communities: A Journey of Empowerment

Shekhar Prasad Rajak

Shekhar is passionate about Open Source Softwares and active in various Open Source Projects. He has contributed SymPy, Ruby gems like: daru, daru-view (author), Bundler, NumPy & SciPy. He has successfully completed Google Summer of Code 2016, 17, also worked as Admin for SciRuby & mentored. Shekhar was speaker at RubyConf 2018, PyCon 2017, ApacheCon 2020 on “Running ML algorithms with ML tools available in Apache Ecosystem” & “Cluster Management in Apache Ecosystem & Kubernetes”.

Apache Flink's Edge in Stream Processing

Thijs Nieuwdorp

Thijs Nieuwdorp is a data scientist at Xomnia and co-author of Python Polars: The Definitive Guide. With a background in Artificial Intelligence from Radboud University, he specializes in innovation, Responsible AI, MLOps, and clean code. At Alliander, Thijs leveraged Polars to optimize simulations of the Dutch power grid, reducing execution time and memory usage by a factor of four and saving massively in costs—contributing to a more reliable power supply.

Turning DataFrames into Pretty Pictures with Plotnine
What we learned by converting a large codebase from Pandas to Polars

Thomas J. Fan

Thomas J. Fan is a senior machine learning engineer at Union.ai and a maintainer of scikit-learn, an open-source machine learning library for Python. He led the development of scikit-learn's set_output API, which allows transformers to return pandas DataFrames. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He also maintains skorch, a neural network library that wraps PyTorch.

Pushing Cython to its Limits in Scikit-learn

Tim Faulkes

Vector Databases Demystified: Discover how they work

Tim Swena

Tim Swena is the team lead for the BigQuery DataFrames project at Google. He is an active participant in the Python data science community and passionate about open source tools that enable data literacy and science.

Get insights from structured and unstructured data using the AI-capable BigQuery DataFrames package

Timothy Hewitt

Timothy Hewitt is a senior product manager at Anaconda focusing on supporting Python in Excel and bringing those new to Python into the fold. Timothy is a linguist at heart and gets super nerdy if you ask him about dictionaries and natural language syntactic structures.

Advanced Excel Analytics: Python Integration Workshop

Timothy Spann

https://github.com/tspannhw/SpeakerProfile

Tim Spann is a Principal Developer Advocate for Zilliz and Milvus. He works with Milvus, Towhee, Attu, GPTCache, Generative AI, HuggingFace, Python, Java, Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Unstructured Data Processing with a Raspberry Pi AI Kit and Python

Tomek Roszczynialski

Data science connoisseur with an obsession for converting numbers into sounds.

Tuning the Transformer: Context-Aware Masking for Controlled Music Generation in MIDI

Udisha Dutta Chowdhury

Udisha Dutta Chowdhury is pursuing a Master’s in Computer Systems Engineering at Northeastern University, Boston, specializing in IoT systems and Machine Learning. She is deeply passionate about machine learning and its applications in IoT systems.

Udisha holds a Bachelor’s degree in Electronics and Communication Engineering with a minor in Computer Science from PES University, Bangalore, India.

During the summer of 2024, she worked as a Data Science Intern at Schneider Electric, Andover, MA, collaborating with the AI Hub's offer management team. She developed Python-based technical products utilizing time series machine learning algorithms for IoT data, and designed frameworks to prototype and benchmark these algorithms on standardized datasets.

Previously, she was a Solution Delivery Analyst at Deloitte USI, where she focused on security analysis, incident response, and threat hunting.

Adopting Open-Source Tools for Time Series Forecasting: Opportunities and Pitfalls

Valentino Constantinou

I love the process of building impactful products and services, from zero to one. Currently building at Infactory: https://www.infactory.ai/

At Terran Orbital, I established a strong internal artificial intelligence capability, building and leading a team of data scientists and machine learning engineers, developing cloud-native, event-driven scalable platforms for remote sensing geospatial analytics and supporting cross-functional internal process automation sprints. The team reduced hardware related production-related commissioning times by over 85%, enabling scalable production of that externally sold component.

At the NASA Jet Propulsion Laboratory ("JPL", operated by the California Institute of Technology, "CalTech"), I served as the Principal Investigator to a multi-year alarm analytics effort, co-organized a monthly meetup of the Lab's open source developers (the Open Developer Meetup), and lead innovative applied machine and deep learning research and development efforts. I released the open-source PyNomaly software during my time there and continue to maintain the software - a core library in anomaly detection.

Interpretable Anomaly Detection for Numerical Data in Python Using PyNomaly

Vishesh Narayan Gupta

Vishesh is a second-year undergraduate student at the University of Maryland and an AI/ML Intern at ExploreDigits, primarily researching and developing NLP techniques for healthcare data tasks.

Temporal Analysis on Topics Utilizing Word2Vec

Wojciech Matejuk

I have been in love with mathematics, physics, and music since childhood, and I started programming at the age of 15 - I have been fascinated by data science ever since. I'm also a guitar player and a performing chorister, now exploring the possible connections between music and data science.

I've had the opportunity to work on a wide range of tasks, from collaborating on software projects for industrial plants and performing time-series modeling through training transformer models with custom tokenizers to implementing machine-learning methods for security automation as a vendor at Google.

I am currently a computer science student at the Faculty of Mathematics and Information Science at Warsaw University of Technology. Since August 2023, I have been working as a Machine Learning Engineer at EPR Labs, where I combine my passions for music, mathematics, and data science. I develop software for training and evaluating large language models on musical data, among other fascinating projects.

Tuning the Transformer: Context-Aware Masking for Controlled Music Generation in MIDI

Zander Matheson

Zander has a varied data background from data science to data and machine learning infrastructure. He founded and currently heads a startup the develops and maintains the open source project, Bytewax. He has worked in the data space since 2014 at Heroku, GitHub, and an NLP startup.

Do Pythons Rust? How we used PyO3 to build a Python Stream Processor with a Rust Heart.

hugo bowne-anderson

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry Vanishing Gradients, where he explores cutting-edge developments in data science and artificial intelligence.
As a data scientist, educator, evangelist, content marketer, and strategist, Hugo has worked with leading companies in the field. His past roles include Head of Developer Relations at Outerbounds, a company committed to building infrastructure for machine learning applications, and positions at Coiled and DataCamp, where he focused on scaling data science and online education respectively.
Hugo's teaching experience spans from institutions like Yale University and Cold Spring Harbor Laboratory to conferences such as SciPy, PyCon, and ODSC. He has also worked with organizations like Data Carpentry to promote data literacy.
His impact on data science education is significant, having developed over 30 courses on the DataCamp platform that have reached more than 3 million learners worldwide. Hugo also created and hosted the popular weekly data industry podcast DataFramed for two years.
Committed to democratizing data skills and access to data science tools, Hugo advocates for open source software both for individuals and enterprises.

Building Your First Multimodal Gen AI App 🚀

nidhin pattaniyil

Machine Learning Engineer working on Search

An Introduction to Retrieval Augmented Generation