PyData NYC 2024

Abhishek Murthy

Abhishek Murthy is currently a Senior Principal Data Scientist at Schneider Electric (SE) in Boston, Massachusetts USA. He is passionate about sustainability, with a focus on climate change. To that end, he develops Machine Learning (ML) algorithms on sensor data that are critical for the sustainability commitments of the Industrial Automation and Energy Management businesses of SE. He is also a lecturer at Northeastern University and teaches machine learning algorithms for the Internet of Things.

Abhishek received his PhD in Computer Science from Stony Brook University, State University of New York and MS in Computer Science from University at Buffalo. His doctoral research, which was part of a National Science Foundation Expedition in Computing, entailed developing algorithms for automatically establishing the input-to-output stability of dynamical systems.

He led the Data Science Algorithms team at WHOOP before joining SE. He also worked at Signify, formerly called Philips Lighting, as a Senior Data Scientist and led research on IoT applications for smart buildings. Abhishek has served on several conference review committees and NSF panels. His research includes several publications and research articles with more than 195 citations. He has been awarded 15 patents and has more than 45 applications pending.

The speaker's profile picture

Sessions

11-07
10:10
40min
Adopting Open-Source Tools for Time Series Forecasting: Opportunities and Pitfalls
Udisha Dutta Chowdhury, Abhishek Murthy

Forecasting involves predicting future values of a time series based on historical values and is critical for informed decision-making in fields like finance, planning, and energy. The open-source community has developed several Python libraries to streamline the commonly recurring stages of a forecasting pipeline, thereby minimizing redundancy and ensuring consistency across different projects. Python Libraries like SKTime, SKForecast, and Darts are some of the most used libraries for time series forecasting; data science teams are often confounded with crafting systematic approaches to evaluating such options.
In this talk, we will present a decision framework for data science leaders and teams to choose the appropriate tooling for their forecasting projects. Specifically, we will explore three critical dimensions that teams must consider:

Data Understanding: How well does the library support Exploratory Data Analysis (EDA)?

Data Preparation: How robust and intuitive are the tool's preprocessing capabilities for handling quality issues, like missing values, NaNs, duplicate data, and exogenous variables?

Modeling & Backtesting: How effective and scalable are the library’s modeling and evaluation capabilities for forecasting algorithms?

Each of these dimensions present tradeoffs and our decision framework is intended to help evaluate and mitigate them. We will present a case study from energy management and use SKTime and SKForecast to guide the discussion. The work presented in this talk was conducted as part of my internship at Schneider Electric.

This talk is ideal for data scientists, machine learning engineers, and technical decision-makers that develop and maintain forecasting products and are driven to scale their efforts. Whether you are new to time series forecasting or an experienced practitioner looking to refine your toolset, this session will provide valuable insights into selecting the right open-source tools for your project.

Music Box