Wojciech Matejuk
I have been in love with mathematics, physics, and music since childhood, and I started programming at the age of 15 - I have been fascinated by data science ever since. I'm also a guitar player and a performing chorister, now exploring the possible connections between music and data science.
I've had the opportunity to work on a wide range of tasks, from collaborating on software projects for industrial plants and performing time-series modeling through training transformer models with custom tokenizers to implementing machine-learning methods for security automation as a vendor at Google.
I am currently a computer science student at the Faculty of Mathematics and Information Science at Warsaw University of Technology. Since August 2023, I have been working as a Machine Learning Engineer at EPR Labs, where I combine my passions for music, mathematics, and data science. I develop software for training and evaluating large language models on musical data, among other fascinating projects.

Sessions
Audio, images, and text already have well-established data processing pipelines proven to yield amazing results with large deep-learning models. However, applying these methods to music, especially in MIDI format, presents unique challenges. In this talk, we explore the application of context-aware masking techniques to data obtained by recording piano performances in MIDI format.
We demonstrate how methods inspired by masked language modeling, image inpainting, and next-token prediction can be adapted to preprocess MIDI data, capturing the harmonic, dynamic, and temporal information essential for music. These preprocessing strategies can lead to the creation of context-aware infilling tasks, which allow for the training of large transformer models that generate more emotionally nuanced musical performances.