Artificial Intelligence for Music

January 09, 2019 0 Comments

Artificial Intelligence for Music

Artificial Intelligence is a computer program designed to think for itself, or learn from information fed to it. We can then use this program to create new material based on what it has learned. AI is becoming more and more prevalent in today’s technology industry. It allows greater automation of complex tasks, and allows technology to learn from its user then adapt to their needs. Now, it’s being taught how to compose music. Completely original, never-before heard music.

How do we teach a computer to compose music?

Machine Learning is a subfield of computer science that focuses on teaching a computer to perform an action, rather than explicitly programming it to do so. A popular method of machine learning is an Artificial Neural Network – a very complicated program that I’ll try my best to explain. Neural Networks use a structure of interconnected artificial neurons, or nodes, that pass information to each other to try to establish useful patterns. They have an input, and output. You can teach a neural network by feeding it data similar to what you want it to produce. Initially, the AI will output something completely random and not particularly useful. This useless output is then tested to see how similar it is to its goal. If some parts are close, the network is happy and keeps some nodes the same, but if parts are totally wrong, the nodes change semi-randomly. This learning, testing, learning, testing, process happens very quickly but often takes several hundred iterations before you notice any real progress. This is the main computing method used to create AI music.

MIDI and Neural Networks

There are two different ways to train an AI on music, both needing different learning materials as an input. The first is based around MIDI data, a term that will be familiar to most musicians and people involved in music production. MIDI stands for ‘Musical Instrument Digital Interface’ and is a smart way of describing musical notes, and all their characteristics in a digital format that a computer can interpret. For example, the MIDI data for a C Major chord would be made of 3 notes, each having a start time, end time, individual pitch values (C, E, and G), and velocity values (this controls how loud each note is). There are a lot more attributes that can be used to describe a note in far greater detail, but that’s the basics – and most likely all that is used to teach an AI. A whole piece of music can be represented by MIDI data, and this data can be presented to a computer program.

Because we can represent music in a small, purely digital way with MIDI, we can easily feed it to a neural network to learn from. The MIDI composing AI, ‘BachBot’ was born in Cambridge University, and was a research project directed by Feynman Liang. BachBot’s goal is to generate harmonised chorales indistinguishable from Bach’s own work. The AI is trained on a given melody, provided in MIDI form, and will generate a series of harmonies similar to those you would hear played on a harpsichord in the 1700s. You can listen to BachBot’s version of ‘Twinkle, Twinkle, Little Star’, here:

Audio and Neural Networks

The second way to teach an AI about music is to skip traditional composition entirely. No MIDI. Instead the Neural Network is given audio data to learn. Audio data files are often many times larger than MIDI files, so teaching them to a neural network takes a lot longer, but it also means that your output is also pure audio data, rather than MIDI. This also means that any sound can be used to train the AI, not just music. You could teach it how to speak, for example, by giving it an audiobook to listen to.

Google’s DeepMind researchers have created WaveNet, an AI that does incredible things with audio. WaveNet learns from a dataset of audio that is similar to the result you are aiming for. So, if you want WaveNet to play bass guitar, you make it learn from bass guitar recordings, or for piano, you make it listen to piano recordings. It will then generate some audio, and test it against the dataset of target audio. As with the MIDI networks, it will then make some changes to try to get closer to the target. WaveNet is very impressive. Here you can listen to a very short clip of some piano music it generated:

Notes VS Waves

Deep learning music applications can be divided into two categories depending on the input method. BachBot uses note sequences in the form of MIDI data, and WaveNet uses raw audio. There are Pros and Cons to both methods. Note Sequences are faster to generate (minutes or hours), and the resulting music can be edited, but the output can only be as complex as the training data. Raw audio takes a very long time to generate (hours or days), and the output cannot be edited, but the result can be as complex as you can imagine.

There isn’t a clear winner when it comes to making music. Both methods are still early in development and have a way to go before they become a viable replacement for the human composer. Eventually, perhaps they will become so advanced and fast that there will be no point in composing music yourself.

Other uses

Machine Learning and Neural Networks have many more uses, some are far more practical and useful than making strange sounding music. AI is being used for image recognition and voice recognition right now. What’s really amazing, is that the more we interact with these technologies, the better they get.

Google Map’s amazing Street View has learned to recognise people’s faces and car number plates. The service will blur all faces and identifiable material in the images to keep your life private while mapping out the entire country. The more pictures we give it to look at, the better it gets at picking out a face it should blur.

Amazon’s voice recognition based assistant, Alexa, uses a complex machine learning system to get better and better at recognising the commands that you speak to her. Every time you say something to her, it is tested, and she adapts. Alexa will be making her way into households over the next few years to assist you in your everyday lives. Whether you want to order some more cat food, dim the lights in the dining room for a romantic meal, or play some of your favourite tunes when your friends are visiting, all of this can be achieved without the need to look at a screen or tap any buttons; just ask Alexa to do it for you.

At Bayan, we are working on including Alexa voice commands in our Multi-room speaker systems. We want you to be able to say “Alexa, play my Cooking Jams playlist on my Bayan Speaker in the kitchen”, and for exactly that to happen. No buttons, no need to rummage in your pocket for your phone, no hassle.

Artificial Intelligence seemed like a thing of the distant future a decade ago, but it’s already making its way into your everyday routine – even if you don’t realise. As H.G. Wells put it in ‘Empire of the Ants’, and later Kent Brockman, “I for one welcome our new [computer] overlords.”

This post was written by Jack Chapman.