Domain-driven Data Science PyData NYC 2024

Domain-driven Data Science
.ical

11-08, 14:35–15:15 (US/Eastern), Winter Garden

While Data Scientists have ample resources for mastering the mathematical and coding aspects of their work, much less attention is given to developing domain expertise. This talk argues that enhancing domain knowledge is crucial for maximizing the business impact of data science projects. Drawing inspiration from software engineering practices such as Domain-Driven Design, I will share examples from my years of experience in the Supply Chain and Logistics sector to illustrate how deepening domain competence can lead to more effective and successful data-driven solutions.

Back in the early days of the Data Revolution, the Data Science Venn diagram emerged as a simple but straightforward framework to understand which shape a wannabe-data-scientist should have: a strong Math/Stats background with Coding skills augmented with some purpose-specific Domain expertise.

Years later, after working hard to learn about Machine Learning (i.e., learning to use something I cannot fully grasp) and Software Engineering (e.g., agonizing over what to name a function), and putting in production a fair mix of well marketed "failures" (fancy code and models that are not used or do not have measurable impact) and silently effective "successes" (boring or simple ideas that actually work), I find myself reflecting on that diagram. I realize that I may not have paid enough attention to effectively acquiring the domain expertise needed to tilt my results toward positive outcomes.

In this talk, I plan to share concrete experiences from my work as a Data Scientist, primarily in the Supply Chain and Logistics domain, and offer insights into the key concepts and techniques for learning about a domain. I will also discuss how domain knowledge can impact data science practice, exploring questions like: Why might a less accurate random forest model be better for users than a state-of-the-art boosting algorithm? Why do users struggle to trust and use my carefully crafted model?

The talk will consist of three parts:

An introduction to domain knowledge for Data Scientists and why it is important.
A primer on Supply Chain and Logistics for Data Scientists (highlighting algorithms and libraries used in the industry).
Stories and ideas on how to learn and apply domain expertise, illustrated with real-life examples of failures and successes in the Supply Chain and Logistics area.

This talk is partly inspired by the broader idea that Software Engineering best practices can often be effectively translated into good practices for Data professionals, and specifically by the concept of Domain-Driven Design, which has successfully inspired the Data Mesh concept in Data Engineering.

Prior Knowledge Expected –

No previous knowledge expected

Pietro Peterlongo

Data Scientist who loves programming (languages). People driven. Trying to focus.
Passionate about Tech Communities and Open Source.

After having worked for a few years putting Machine Learning in production for a Supply Chain Planning and Optimization tool, I took a break, went to a few conferences, joined the Python Milano organizers and help launch PyData Milan chapter. I also spent a period remotely working for batch at Recurse Center a special place (retreat for programmers to dramatically improve their skills) in New York which I plan to visit in real life next time I pass by the city.

I am now happily employed by AgileLab, an Italian born company with the purpose of "elevate the data engineering game empowering companies to shape their future around data." We have a public handbook and a self management system called Holocracy. Stop me and say hi if you are curious about any of the above. Or even if you are not. :)

Domain-driven Data Science .ical 11-08, 14:35–15:15 (US/Eastern), Winter Garden

Domain-driven Data Science
.ical

11-08, 14:35–15:15 (US/Eastern), Winter Garden