Deep Learning Systems for Musical Audio Generation

  • Oore, Sageev S (PI)

Project: Research project

Project Details

Description

Imagine you are creating a soundtrack for a video, and you have a machine learning (ML)-driven music creation tool/assistant. Your ML assistant can generate sounds and musical clips, but there is a problem:it can neither follow instructions, nor figure out what needs to be done. For an assistant to be helpful to humans, there must be a way to direct what it does and how it does it.

My research programme is concerned with ML in the sphere of audio and music generation, and this proposal is concerned with imbuing generative models for music and audio with powerful controls (e.g. to allow effective ML-based tools). Specifically, I plan to explore this in two main musical contexts: (1) generating sequences of distinct notes (e.g. keypresses on a piano), and (2) generating raw audio files, i.e. soundwaves, one “measurement” at a time (where the lowest-quality sounds have at least 16,000 such measurements per second).

Finely controlling generation in these domains is hard for reasons including:

(1) Our vocabulary to describe sounds is limited and ill-defined. We can hear that this violin sounds \"richer\" than that one, or that this speech has a \"more articulated rhythm\" than that one, but we may not know how to quantify these qualities.

(2) Most data does not come with these labels.

The ML challenges implied by these difficulties are fundamental ones: learning underlying structure in large, minimally labelled datasets of extremely long sequences.

So how do we approach the task of control? We first notice what people do: a teacher tells a student, \"play it this way,\" providing a related example; that single example becomes immediately helpful, since the student already has a wordless mental map of sounds, and the example becomes an analogy that points to a spot on that map.

I want to find techniques to allow control over generative models in such ways. This involves: (I) Learning good maps (~disentangled latent representations) of sound. For example, moving along one direction might mean more rhythmic in some way. (II) Learning these from mainly unlabelled data, making effective use of rare labelled examples (~semi-supervised learning).

This is important because:

1) Some ML problems inherent to this problem are fundamental, so their solutions will be fundamental as well.

2) Controlling generative models can allow them to be helpful..

..to artists, as they will provide effective creativity support to the creative economy.

..to amateur musicians, because such tools can easily have great educational value.

..to health. Rhythmic music can help motor rehabilitation; imagine a tireless and adaptive music generator designed specifically for rehabilitation. Psychiatric diagnoses are sometimes based on non-verbal speech qualities; imagine controlling speech generation by examples: “use a voice like Person A, with an accent like Person B, and with the prosody of Person C.” and in doing so, helping training and removing potential bias effects.

StatusActive
Effective start/end date1/1/20 → …

Funding

  • Natural Sciences and Engineering Research Council of Canada: US$21,855.00

ASJC Scopus Subject Areas

  • Music
  • Artificial Intelligence