Monday, May 4, 09:30 – 13:00

AI x Audio: From Research to Production

AI has changed the way we process and create audio, especially music. This opens new possibilities and enables new products that could not be envisioned some years ago. In this industry session, we want to give an overview of Sony’s activities in this field.

We start this session with an introduction into music source separation. Sony has been active in AI-based source separation since 2013 and our systems have repeatedly won international evaluation campaigns. In the last years, we could successfully integrate this technology into a number of products, which we will introduce as well.

Recently, INRIA released -in collaboration with Sony- open-unmix, an open-source implementation of our music source separation. open-unmix is available for NNabla as well as PyTorch.

Finally, in this first part, we will briefly introduce the NNabla open-source project. NNabla is Sony’s Deep Learning Library, which we are actively developing worldwide. We will give a brief overview of its main features and compare it to other popular DL frameworks. We will highlight its focus on network compression and speed, making it a good choice for audio and music product development and prototyping.

In the second part of the session, we will present our activities on music creation where we envision technologies that could drive music for the years to come. Through deep learning-based approaches, we develop tools that enhance a composer’s creativity and augment his capabilities. In our talk, we briefly present our research activities, including details about the underlying machine learning models. For these tools to be relevant, we rely on close collaboration with artists from Sony Music Entertainment, which can sometimes be tricky. Indeed, we are often experiencing a gap that exists between scientific research and the music industry on many levels, such as timeliness or profitability. Hence, the presentation will also address our efforts to bridge that gap.

Presenter bio: Stefan Uhlich received the Dipl.-Ing. and PhD degree in electrical engineering from the University of Stuttgart, Germany, in 2006 and 2012, respectively. From 2007 to 2011 he was a research assistant at the Chair of System Theory and Signal Processing, University of Stuttgart. In this time he worked in the area of statistical signal processing, focusing especially on parameter estimation theory and methods. Since 2011, he is with the Sony Stuttgart Technology Center where he works as a Principal Engineer on problems in music source separation, speech enhancement and deep neural network compactization.

Presenter bio: Stefan Lattner is a research associate at Sony CSL Paris, where he works on transformation and invariance learning with artificial neural networks. Using this paradigm, he targets rhythm generation (i.e., DrumNet) and is also involved in music information retrieval, audio generation, and recommendation. He obtained his doctorate in the area of music structure modeling from the Johannes Kepler University in Linz, Austria.

Presenter bio: Cyran Aouameur is an assistant researcher at Sony CSL. Graduated from Ircam-organized ATIAM Master’s degree, he entered CSL two years ago. Passionate about urban music since he was a child, he has been focusing on developing AI-based solutions for artists to quickly design unique drum sounds and rhythms, which he considers being top-importance elements. He is now partly responsible for the communication with the artists, seeking to get the research and the music industry worlds to understand each other.

%d bloggers like this: