PyData NYC 2024

Udisha Dutta Chowdhury

Udisha Dutta Chowdhury is pursuing a Master’s in Computer Systems Engineering at Northeastern University, Boston, specializing in IoT systems and Machine Learning. She is deeply passionate about machine learning and its applications in IoT systems.

Udisha holds a Bachelor’s degree in Electronics and Communication Engineering with a minor in Computer Science from PES University, Bangalore, India.

During the summer of 2024, she worked as a Data Science Intern at Schneider Electric, Andover, MA, collaborating with the AI Hub's offer management team. She developed Python-based technical products utilizing time series machine learning algorithms for IoT data, and designed frameworks to prototype and benchmark these algorithms on standardized datasets.

Previously, she was a Solution Delivery Analyst at Deloitte USI, where she focused on security analysis, incident response, and threat hunting.

The speaker's profile picture

Sessions

11-07
10:10
40min
Adopting Open-Source Tools for Time Series Forecasting: Opportunities and Pitfalls
Udisha Dutta Chowdhury, Abhishek Murthy

Forecasting involves predicting future values of a time series based on historical values and is critical for informed decision-making in fields like finance, planning, and energy. The open-source community has developed several Python libraries to streamline the commonly recurring stages of a forecasting pipeline, thereby minimizing redundancy and ensuring consistency across different projects. Python Libraries like SKTime, SKForecast, and Darts are some of the most used libraries for time series forecasting; data science teams are often confounded with crafting systematic approaches to evaluating such options.
In this talk, we will present a decision framework for data science leaders and teams to choose the appropriate tooling for their forecasting projects. Specifically, we will explore three critical dimensions that teams must consider:

Data Understanding: How well does the library support Exploratory Data Analysis (EDA)?

Data Preparation: How robust and intuitive are the tool's preprocessing capabilities for handling quality issues, like missing values, NaNs, duplicate data, and exogenous variables?

Modeling & Backtesting: How effective and scalable are the library’s modeling and evaluation capabilities for forecasting algorithms?

Each of these dimensions present tradeoffs and our decision framework is intended to help evaluate and mitigate them. We will present a case study from energy management and use SKTime and SKForecast to guide the discussion. The work presented in this talk was conducted as part of my internship at Schneider Electric.

This talk is ideal for data scientists, machine learning engineers, and technical decision-makers that develop and maintain forecasting products and are driven to scale their efforts. Whether you are new to time series forecasting or an experienced practitioner looking to refine your toolset, this session will provide valuable insights into selecting the right open-source tools for your project.

Music Box