Monday, 4 May, 09:30 – 13:00
AI x Audio: From Research to Production
AI has changed the way we process and create audio, especially music. This opens new possibilities and enables new products that could not be envisioned some years ago. In this industry session, we want to give an overview of Sony’s activities in this field.
We start this session with an introduction into music source separation. Sony has been active in AI-based source separation since 2013 and our systems have repeatedly won international evaluation campaigns. In the last years, we could successfully integrate this technology into a number of products, which we will introduce as well.
Recently, INRIA released -in collaboration with Sony- open-unmix, an open-source implementation of our music source separation. open-unmix is available for NNabla as well as PyTorch.
Finally, in this first part, we will briefly introduce the NNabla open-source project. NNabla is Sony’s Deep Learning Library, which we are actively developing worldwide. We will give a brief overview of its main features and compare it to other popular DL frameworks. We will highlight its focus on network compression and speed, making it a good choice for audio and music product development and prototyping.
In the second part of the session, we will present our activities on music creation where we envision technologies that could drive music for the years to come. Through deep learning-based approaches, we develop tools that enhance a composer’s creativity and augment his capabilities. In our talk, we briefly present our research activities, including details about the underlying machine learning models. For these tools to be relevant, we rely on close collaboration with artists from Sony Music Entertainment, which can sometimes be tricky. Indeed, we are often experiencing a gap that exists between scientific research and the music industry on many levels, such as timeliness or profitability. Hence, the presentation will also address our efforts to bridge that gap.
Presenter bio: Mototsugu Abe is a Senior General Manager and Chief Distinguished Researcher at R&D Center of Sony Corporation. As a researcher, he specializes in audio signal processing, intelligent sensing and pattern recognition. As a manager, he supervises fundamental technology R&D in information technology field including video, image, audio, speech, natural language, communication, RF, robotics, sensing and machine learning technologies. He received a Ph.D in engineering from the University of Tokyo in 1999 and has been with Sony Corporation since then. From 2003 to 2004, he was a visiting scholar at Stanford University worked with Prof. Julius O. Smith III.
Presenter bio: Marc Ferras received the B.S. degree in computer science, the M.S. degree in telecommunications, and the European Master in Language and Speech from the Universitat Politecnica de Catalunya (UPC), Spain, in 1999 and 2005, respectively. He received his PhD. degree from Université Paris-Sud XI, France, in 2009, researching the use of automatic speech recognition in speaker recognition tasks. Since, he has hold two post-doc positions, one at Tokyo Institute of Technology, Japan (2009-2011) and one at the Idiap Research Institute, Switzerland (2011-2016), both focused on automatic speech and speaker recognition. He is currently working at SONY’s Stuttgart Technology Center as a Senior Engineer working on speech recognition technology.
Presenter bio: Stefan Lattner is a research associate at Sony CSL Paris, where he works on transformation and invariance learning with artificial neural networks. Using this paradigm, he targets rhythm generation (i.e., DrumNet) and is also involved in music information retrieval, audio generation, and recommendation. He obtained his doctorate in the area of music structure modeling from the Johannes Kepler University in Linz, Austria.
Presenter bio: Cyran Aouameur is an assistant researcher at Sony CSL. Graduated from Ircam-organized ATIAM Master’s degree, he entered CSL two years ago. Passionate about urban music since he was a child, he has been focusing on developing AI-based solutions for artists to quickly design unique drum sounds and rhythms, which he considers being top-importance elements. He is now partly responsible for the communication with the artists, seeking to get the research and the music industry worlds to understand each other.