[Music]
[Music] cool thank you um just so I know what
cool thank you um just so I know what
cool thank you um just so I know what level to speak at raise your hands if
level to speak at raise your hands if
level to speak at raise your hands if you know who Bach is great raise your
you know who Bach is great raise your
you know who Bach is great raise your hand if you know what a neural network
hand if you know what a neural network
hand if you know what a neural network is oh this is the perfect crowd awesome
is oh this is the perfect crowd awesome
is oh this is the perfect crowd awesome if you don’t know don’t worry I’m going
if you don’t know don’t worry I’m going
if you don’t know don’t worry I’m going to cover the very basics of both so
to cover the very basics of both so
to cover the very basics of both so let’s talk about Bach I’m going to play
let’s talk about Bach I’m going to play
let’s talk about Bach I’m going to play to you some music
to you some music
to you some music [Music]
[Music]
[Music] now what you just heard is what’s known
now what you just heard is what’s known
now what you just heard is what’s known as a coral there are four parts to it a
as a coral there are four parts to it a
as a coral there are four parts to it a soprano alto tenor bass playing at the
soprano alto tenor bass playing at the
soprano alto tenor bass playing at the exact same time and there’s very regular
exact same time and there’s very regular
exact same time and there’s very regular phrasing structure where you have the
phrasing structure where you have the
phrasing structure where you have the beginning of a phrase the determination
beginning of a phrase the determination
beginning of a phrase the determination of a phrase followed by the next phrase
of a phrase followed by the next phrase
of a phrase followed by the next phrase except that wasn’t Bach rather that was
except that wasn’t Bach rather that was
except that wasn’t Bach rather that was a computer algorithm called Bach bot and
a computer algorithm called Bach bot and
a computer algorithm called Bach bot and that was one sample out of its outputs
that was one sample out of its outputs
that was one sample out of its outputs if you don’t believe me it’s on
if you don’t believe me it’s on
if you don’t believe me it’s on soundcloud it’s called sample one go
soundcloud it’s called sample one go
soundcloud it’s called sample one go listen for yourself so instead of
listen for yourself so instead of
listen for yourself so instead of talking about box today I’m going to
talking about box today I’m going to
talking about box today I’m going to talk to you about Bach bot hi my name is
talk to you about Bach bot hi my name is
talk to you about Bach bot hi my name is phiman and it’s a pleasure to be here at
phiman and it’s a pleasure to be here at
phiman and it’s a pleasure to be here at Amsterdam and today we’ll talk about
Amsterdam and today we’ll talk about
Amsterdam and today we’ll talk about autumn is automatic stylistic
autumn is automatic stylistic
autumn is automatic stylistic composition using long short term memory
composition using long short term memory
composition using long short term memory so then a background about myself I’m
so then a background about myself I’m
so then a background about myself I’m currently a software engineer at gigster
currently a software engineer at gigster
currently a software engineer at gigster where I walk at work on interesting
where I walk at work on interesting
where I walk at work on interesting automation problems regarding I’m taking
automation problems regarding I’m taking
automation problems regarding I’m taking contracts divided them into sub
contracts divided them into sub
contracts divided them into sub contracts and then freelancing them out
contracts and then freelancing them out
contracts and then freelancing them out the work on Bach bot was done as part of
the work on Bach bot was done as part of
the work on Bach bot was done as part of my master’s thesis where which I did at
my master’s thesis where which I did at
my master’s thesis where which I did at the University of Cambridge with
the University of Cambridge with
the University of Cambridge with Microsoft Research Cambridge in line
Microsoft Research Cambridge in line
Microsoft Research Cambridge in line with the track here I do not have a PhD
with the track here I do not have a PhD
with the track here I do not have a PhD and so and I still can do machine
and so and I still can do machine
and so and I still can do machine learning so this is the fact this is a
learning so this is the fact this is a
learning so this is the fact this is a fact you can do machine learning without
fact you can do machine learning without
fact you can do machine learning without a PhD for those of you who just want to
a PhD for those of you who just want to
a PhD for those of you who just want to know what’s going to happen and then get
know what’s going to happen and then get
know what’s going to happen and then get out of here because it’s not interesting
out of here because it’s not interesting
out of here because it’s not interesting here is the executive summary I’m going
here is the executive summary I’m going
here is the executive summary I’m going to talk to you about how to train end to
to talk to you about how to train end to
to talk to you about how to train end to end starting from data sets preparation
end starting from data sets preparation
end starting from data sets preparation all the way to model tuning and
all the way to model tuning and
all the way to model tuning and deployment of a deep recurrent neural
deployment of a deep recurrent neural
deployment of a deep recurrent neural network for music this neural network is
network for music this neural network is
network for music this neural network is capable of polysemy multiple
capable of polysemy multiple
capable of polysemy multiple simultaneous voices at the same time
simultaneous voices at the same time
simultaneous voices at the same time it’s capable automatic composition
it’s capable automatic composition
it’s capable automatic composition generating a composition completely from
generating a composition completely from
generating a composition completely from scratch as well as harmonization given
scratch as well as harmonization given
scratch as well as harmonization given some fixed parts such as the soprano
some fixed parts such as the soprano
some fixed parts such as the soprano line of the melody generate the
line of the melody generate the
line of the melody generate the remaining supporting parts this model
remaining supporting parts this model
remaining supporting parts this model learns music theory without being told
learns music theory without being told
learns music theory without being told to do so providing empirical validation
to do so providing empirical validation
to do so providing empirical validation of what music theorists have been using
of what music theorists have been using
of what music theorists have been using for centuries and
for centuries and
for centuries and finally it’s evaluated on an online
finally it’s evaluated on an online
finally it’s evaluated on an online musical Turing test we’re out of 1700
musical Turing test we’re out of 1700
musical Turing test we’re out of 1700 participants only nine percent are able
participants only nine percent are able
participants only nine percent are able to distinguish actual Bach from Bach
to distinguish actual Bach from Bach
to distinguish actual Bach from Bach Bach
when I set off on this research there
when I set off on this research there were three primary goals the first
were three primary goals the first
were three primary goals the first question I wanted to answer was what is
question I wanted to answer was what is
question I wanted to answer was what is the frontier of computational creativity
the frontier of computational creativity
the frontier of computational creativity now creativity is something we take to
now creativity is something we take to
now creativity is something we take to be an 8 li human innately special in
be an 8 li human innately special in
be an 8 li human innately special in some sense computers shouldn’t ought not
some sense computers shouldn’t ought not
some sense computers shouldn’t ought not to be able to replicate this about us is
to be able to replicate this about us is
to be able to replicate this about us is this actually true can we have computers
this actually true can we have computers
this actually true can we have computers generate art that is convincingly human
generate art that is convincingly human
generate art that is convincingly human the second question I wanted to answer
the second question I wanted to answer
the second question I wanted to answer was how much does deep learning impacted
was how much does deep learning impacted
was how much does deep learning impacted automatic music composition now
automatic music composition now
automatic music composition now automatic music composition is a special
automatic music composition is a special
automatic music composition is a special field it has been dominated by symbolic
field it has been dominated by symbolic
field it has been dominated by symbolic methods which utilize things like formal
methods which utilize things like formal
methods which utilize things like formal grammars or context-free grammars such
grammars or context-free grammars such
grammars or context-free grammars such as this parse tree we’ve seen
as this parse tree we’ve seen
as this parse tree we’ve seen connectionist methods in the early 19th
connectionist methods in the early 19th
connectionist methods in the early 19th century however we have it however they
century however we have it however they
century however we have it however they have they followed in popularity and
have they followed in popularity and
have they followed in popularity and most recent systems have used symbolic
most recent systems have used symbolic
most recent systems have used symbolic methods with the work here I wanted to
methods with the work here I wanted to
methods with the work here I wanted to see did the new advances in deep
see did the new advances in deep
see did the new advances in deep learning in the last 10 years can they
learning in the last 10 years can they
learning in the last 10 years can they be transferred over to this particular
be transferred over to this particular
be transferred over to this particular problem domain and finally the last
problem domain and finally the last
problem domain and finally the last question I wanted to look at is how do
question I wanted to look at is how do
question I wanted to look at is how do we evaluate these generative models I
we evaluate these generative models I
we evaluate these generative models I mean we’ve seen we’ve seen in the
mean we’ve seen we’ve seen in the
mean we’ve seen we’ve seen in the previous talk a lot of a lot of models
previous talk a lot of a lot of models
previous talk a lot of a lot of models they generate art we look at it and as
they generate art we look at it and as
they generate art we look at it and as the author we say oh that’s convincing
the author we say oh that’s convincing
the author we say oh that’s convincing but oh that’s beautiful and great that
but oh that’s beautiful and great that
but oh that’s beautiful and great that might be a perfectly valid use case but
might be a perfectly valid use case but
might be a perfectly valid use case but it’s not sufficient for publication to
it’s not sufficient for publication to
it’s not sufficient for publication to publish something we need to establish a
publish something we need to establish a
publish something we need to establish a standardized benchmark and we need to be
standardized benchmark and we need to be
standardized benchmark and we need to be able to evaluate all of our models about
able to evaluate all of our models about
able to evaluate all of our models about it so we can objectively say which model
it so we can objectively say which model
it so we can objectively say which model is better than the other now if you’re
is better than the other now if you’re
is better than the other now if you’re still here I’m assuming you’re
still here I’m assuming you’re
still here I’m assuming you’re interested this is the outline we’ll
interested this is the outline we’ll
interested this is the outline we’ll start with a quick primer on music
start with a quick primer on music
start with a quick primer on music theory giving you just the basic
theory giving you just the basic
theory giving you just the basic terminology you need to understand the
terminology you need to understand the
terminology you need to understand the remainder of this presentation we’ll
remainder of this presentation we’ll
remainder of this presentation we’ll talk about how to prepare a data set of
talk about how to prepare a data set of
talk about how to prepare a data set of Bach Corral’s
Bach Corral’s
Bach Corral’s well then gate will get the give a
well then gate will get the give a
well then gate will get the give a primer on recurrent neural networks
primer on recurrent neural networks
primer on recurrent neural networks which is the actual deep learning model
which is the actual deep learning model
which is the actual deep learning model architecture used to build Bach Bach
architecture used to build Bach Bach
architecture used to build Bach Bach we’ll talk about the Bach Bach model
we’ll talk about the Bach Bach model
we’ll talk about the Bach Bach model itself the tips and tricks and
itself the tips and tricks and
itself the tips and tricks and techniques that we used in order to
techniques that we used in order to
techniques that we used in order to train it
train it
train it have it run successfully as well as
have it run successfully as well as
have it run successfully as well as deploy it and then we’ll show the
deploy it and then we’ll show the
deploy it and then we’ll show the results
results
results well show how this model is able to
well show how this model is able to
well show how this model is able to capture statistical regularities in box
capture statistical regularities in box
capture statistical regularities in box musical style and we’ll prove a we won’t
musical style and we’ll prove a we won’t
musical style and we’ll prove a we won’t prove we’ll provide very convincing
prove we’ll provide very convincing
prove we’ll provide very convincing evidence that music theory does have
evidence that music theory does have
evidence that music theory does have theoretical gesture empirical
theoretical gesture empirical
theoretical gesture empirical justification and finally I’ll show the
justification and finally I’ll show the
justification and finally I’ll show the results of the musical Turing test which
results of the musical Turing test which
results of the musical Turing test which was our proposed evaluation methodology
was our proposed evaluation methodology
was our proposed evaluation methodology for saying yes
for saying yes
for saying yes this model has solves our research goal
this model has solves our research goal
this model has solves our research goal the the task of automatically composing
the the task of automatically composing
the the task of automatically composing convincing Bach chorale is more closed
convincing Bach chorale is more closed
convincing Bach chorale is more closed than open of a problem as a result of
than open of a problem as a result of
than open of a problem as a result of Bach plot and if you’re a hands-on type
Bach plot and if you’re a hands-on type
Bach plot and if you’re a hands-on type of learner we’ve containerized the
of learner we’ve containerized the
of learner we’ve containerized the entire deployment so if you go to my
entire deployment so if you go to my
entire deployment so if you go to my website here I have a copy of the slides
website here I have a copy of the slides
website here I have a copy of the slides which have all of these instructions you
which have all of these instructions you
which have all of these instructions you run this eight lines of code and it runs
run this eight lines of code and it runs
run this eight lines of code and it runs this entire and pipeline right here
this entire and pipeline right here
this entire and pipeline right here where it takes the corrals it pre
where it takes the corrals it pre
where it takes the corrals it pre processes them puts them into a data
processes them puts them into a data
processes them puts them into a data store trains of trains the deep learning
store trains of trains the deep learning
store trains of trains the deep learning model samples the deep learning model
model samples the deep learning model
model samples the deep learning model produces outputs that you can listen to
let’s start with basic music theory now
let’s start with basic music theory now when people think of music this is
when people think of music this is
when people think of music this is usually what you think about you got
usually what you think about you got
usually what you think about you got these bar lines you got notes and these
these bar lines you got notes and these
these bar lines you got notes and these notes are on different horizontal and
notes are on different horizontal and
notes are on different horizontal and vertical positions some of them have
vertical positions some of them have
vertical positions some of them have interesting ties some of them of dots
interesting ties some of them of dots
interesting ties some of them of dots this is interesting little weird hat
this is interesting little weird hat
this is interesting little weird hat looking thing we don’t need all this we
looking thing we don’t need all this we
looking thing we don’t need all this we need three fundamental concepts the
need three fundamental concepts the
need three fundamental concepts the first is pitch pitch is often referred
first is pitch pitch is often referred
first is pitch pitch is often referred to as how low or how high a note is so
to as how low or how high a note is so
to as how low or how high a note is so if I play this we can distinguish that
if I play this we can distinguish that
if I play this we can distinguish that some notes are lower and some notes are
some notes are lower and some notes are
some notes are lower and some notes are higher in frequency and that corresponds
higher in frequency and that corresponds
higher in frequency and that corresponds to the vertical axis here as the notes
to the vertical axis here as the notes
to the vertical axis here as the notes of the notes sound ascending they appear
of the notes sound ascending they appear
of the notes sound ascending they appear ascending on the bar lines the second
ascending on the bar lines the second
ascending on the bar lines the second attribute we need is duration and this
attribute we need is duration and this
attribute we need is duration and this is really how long a notice so this one
is really how long a notice so this one
is really how long a notice so this one note these two notes these four and
note these two notes these four and
note these two notes these four and these eight all have equal total
these eight all have equal total
these eight all have equal total duration but they are they’re having zuv
duration but they are they’re having zuv
duration but they are they’re having zuv each other so if we take a listen
the general intuition is the more bars
the general intuition is the more bars there are on these tides the faster the
there are on these tides the faster the
there are on these tides the faster the notes appear with just those two
notes appear with just those two
notes appear with just those two concepts this is starting to make a
concepts this is starting to make a
concepts this is starting to make a little bit more sense this right here is
little bit more sense this right here is
little bit more sense this right here is twice as fast as this note we can see
twice as fast as this note we can see
twice as fast as this note we can see this note is higher than this note and
this note is higher than this note and
this note is higher than this note and you can generalize this to the remainder
you can generalize this to the remainder
you can generalize this to the remainder of this but there’s still this funny hat
of this but there’s still this funny hat
of this but there’s still this funny hat looking thing we’ll get to the hat in a
looking thing we’ll get to the hat in a
looking thing we’ll get to the hat in a sec but with pitch and duration we can
sec but with pitch and duration we can
sec but with pitch and duration we can rewrite the music like so rather than
rewrite the music like so rather than
rewrite the music like so rather than representing it using notes which may be
representing it using notes which may be
representing it using notes which may be kind of cryptic we show it here as a
kind of cryptic we show it here as a
kind of cryptic we show it here as a matrix where on the x axis we have time
matrix where on the x axis we have time
matrix where on the x axis we have time so the duration and on the y-axis we
so the duration and on the y-axis we
so the duration and on the y-axis we have pitch how high or low and frequency
have pitch how high or low and frequency
have pitch how high or low and frequency that note is and what we’ve done is
that note is and what we’ve done is
that note is and what we’ve done is we’ve taken the symbolic representation
we’ve taken the symbolic representation
we’ve taken the symbolic representation of music and we’ve turned it into a
of music and we’ve turned it into a
of music and we’ve turned it into a digital computable format that we can
digital computable format that we can
digital computable format that we can train models on back to the hat looking
train models on back to the hat looking
train models on back to the hat looking thing this is called a Fermata and Bach
thing this is called a Fermata and Bach
thing this is called a Fermata and Bach used it to denote the ends of phrases we
used it to denote the ends of phrases we
used it to denote the ends of phrases we had originally said about this research
had originally said about this research
had originally said about this research completely neglecting for modest and we
completely neglecting for modest and we
completely neglecting for modest and we found that the phrases generated by the
found that the phrases generated by the
found that the phrases generated by the model just kind of wandered they never
model just kind of wandered they never
model just kind of wandered they never seem to end there was no sense of
seem to end there was no sense of
seem to end there was no sense of resolution or conclusion and that was
resolution or conclusion and that was
resolution or conclusion and that was unrealistic but by adding these four
unrealistic but by adding these four
unrealistic but by adding these four modest all of a sudden the model turned
modest all of a sudden the model turned
modest all of a sudden the model turned around and we and we suddenly found
around and we and we suddenly found
around and we and we suddenly found realistic phrasing structure cool and
realistic phrasing structure cool and
realistic phrasing structure cool and that’s all the music you need to know
that’s all the music you need to know
that’s all the music you need to know the rest of it is machine learning now
the rest of it is machine learning now
the rest of it is machine learning now the biggest part of a machine learning
the biggest part of a machine learning
the biggest part of a machine learning engineer’s job is preparing their data
engineer’s job is preparing their data
engineer’s job is preparing their data sets this is a very painful task usually
sets this is a very painful task usually
sets this is a very painful task usually have to scour the internet or find some
have to scour the internet or find some
have to scour the internet or find some standardized data set that you train and
standardized data set that you train and
standardized data set that you train and evaluate your models on that usually
evaluate your models on that usually
evaluate your models on that usually these data sets have to be pre processed
these data sets have to be pre processed
these data sets have to be pre processed and massaged into a format that’s
and massaged into a format that’s
and massaged into a format that’s amenable for learning upon and for us it
amenable for learning upon and for us it
amenable for learning upon and for us it was no different box works however
was no different box works however
was no different box works however fortunately over the years have been
fortunately over the years have been
fortunately over the years have been transcribed into excuse my German Bach
transcribed into excuse my German Bach
transcribed into excuse my German Bach worka Vera – Nix BW sorry
worka Vera – Nix BW sorry
worka Vera – Nix BW sorry dwv is how I’ve been referring to this
dwv is how I’ve been referring to this
dwv is how I’ve been referring to this corpus it contains about all 438
corpus it contains about all 438
corpus it contains about all 438 harmonizations of Bach
harmonizations of Bach
harmonizations of Bach Corral’s and conveniently it is
Corral’s and conveniently it is
Corral’s and conveniently it is available through the software package
available through the software package
available through the software package called music21
called music21
called music21 this is a Python package that you can
this is a Python package that you can
this is a Python package that you can just tip install and then import it and
just tip install and then import it and
just tip install and then import it and now you have an iterator over a
now you have an iterator over a
now you have an iterator over a collection of music the first
collection of music the first
collection of music the first pre-processing step we did is we took
pre-processing step we did is we took
pre-processing step we did is we took the music the original music here and we
the music the original music here and we
the music the original music here and we did two things we transposed it and then
did two things we transposed it and then
did two things we transposed it and then we quantize it in time now you can
we quantize it in time now you can
we quantize it in time now you can notice the transposition by looking at
notice the transposition by looking at
notice the transposition by looking at these accidentals right here these two
these accidentals right here these two
these accidentals right here these two little funny backwards or forwards B’s
little funny backwards or forwards B’s
little funny backwards or forwards B’s and then they’re absent over here
and then they’re absent over here
and then they’re absent over here furthermore that note has shifted up by
furthermore that note has shifted up by
furthermore that note has shifted up by half a line that’s a little hard to see
half a line that’s a little hard to see
half a line that’s a little hard to see but it’s happening and the reason why we
but it’s happening and the reason why we
but it’s happening and the reason why we did this is we didn’t want to learn key
did this is we didn’t want to learn key
did this is we didn’t want to learn key signature key signature is usually
signature key signature is usually
signature key signature is usually something decided by the author before
something decided by the author before
something decided by the author before the pieces even begun to compose and so
the pieces even begun to compose and so
the pieces even begun to compose and so we can and so key signature itself can
we can and so key signature itself can
we can and so key signature itself can be injected as a pre-processing step
be injected as a pre-processing step
be injected as a pre-processing step where we sample over all the keys Bach
where we sample over all the keys Bach
where we sample over all the keys Bach did use so we removed key fingers from
did use so we removed key fingers from
did use so we removed key fingers from the equation through transposition and
the equation through transposition and
the equation through transposition and I’ll justify why that’s an okay thing to
I’ll justify why that’s an okay thing to
I’ll justify why that’s an okay thing to do in the next slide this first measure
do in the next slide this first measure
do in the next slide this first measure is written is is a progression of five
is written is is a progression of five
is written is is a progression of five notes written in C major and then what I
notes written in C major and then what I
notes written in C major and then what I did in the next measure is I just moved
did in the next measure is I just moved
did in the next measure is I just moved it up by five whole steps
it up by five whole steps
it up by five whole steps [Music]
so yeah the pitch did change it’s
so yeah the pitch did change it’s relatively higher it’s absolutely higher
relatively higher it’s absolutely higher
relatively higher it’s absolutely higher on all accounts
on all accounts
on all accounts but the relations between the notes
but the relations between the notes
but the relations between the notes didn’t change and the sensation the the
didn’t change and the sensation the the
didn’t change and the sensation the the motifs that the music is bringing out
motifs that the music is bringing out
motifs that the music is bringing out those still remain fairly constant even
those still remain fairly constant even
those still remain fairly constant even after transposition quantization that
after transposition quantization that
after transposition quantization that however is a different story if I go
however is a different story if I go
however is a different story if I go back to slides will notice quantization
back to slides will notice quantization
back to slides will notice quantization to this 30-second note and turn it into
to this 30-second note and turn it into
to this 30-second note and turn it into a sixteenth note by removing that second
a sixteenth note by removing that second
a sixteenth note by removing that second bar we’ve distorted time is that a
bar we’ve distorted time is that a
bar we’ve distorted time is that a problem it is it’s not it’s not perfect
problem it is it’s not it’s not perfect
problem it is it’s not it’s not perfect but it’s a very minor problem so over
but it’s a very minor problem so over
but it’s a very minor problem so over here I’ve plotted a histogram of all of
here I’ve plotted a histogram of all of
here I’ve plotted a histogram of all of the durations inside of the corral
the durations inside of the corral
the durations inside of the corral corpus and this quantization affects
corpus and this quantization affects
corpus and this quantization affects only 0.2% of all the notes that we’re
only 0.2% of all the notes that we’re
only 0.2% of all the notes that we’re training on the reason that we do it is
training on the reason that we do it is
training on the reason that we do it is by quantizing in time we’re able to get
by quantizing in time we’re able to get
by quantizing in time we’re able to get discrete representations in both time as
discrete representations in both time as
discrete representations in both time as well as in pitch whereas working on a
well as in pitch whereas working on a
well as in pitch whereas working on a continuous time axis now you have to
continuous time axis now you have to
continuous time axis now you have to deal computers are discrete and are
deal computers are discrete and are
deal computers are discrete and are unable to operate on the continuous
unable to operate on the continuous
unable to operate on the continuous representation has to be quantized into
representation has to be quantized into
representation has to be quantized into a digital format somehow the last
a digital format somehow the last
a digital format somehow the last challenge polyphony so polysemy is the
challenge polyphony so polysemy is the
challenge polyphony so polysemy is the presence of multiple simultaneous voices
presence of multiple simultaneous voices
presence of multiple simultaneous voices so far the examples that I’ve shown you
so far the examples that I’ve shown you
so far the examples that I’ve shown you you’ve just heard a single voice playing
you’ve just heard a single voice playing
you’ve just heard a single voice playing at any given time but a Corral has four
at any given time but a Corral has four
at any given time but a Corral has four voices the soprano the alto the tenor
voices the soprano the alto the tenor
voices the soprano the alto the tenor the bass and so here’s a question for
the bass and so here’s a question for
the bass and so here’s a question for you if I have four voices and they can
you if I have four voices and they can
you if I have four voices and they can each represent 128 different pitches
each represent 128 different pitches
each represent 128 different pitches that’s the constraint in MIDI
that’s the constraint in MIDI
that’s the constraint in MIDI representation of music how many
representation of music how many
representation of music how many different chords can I construct very
different chords can I construct very
different chords can I construct very good yes 128 ^ 4 that’s correct
good yes 128 ^ 4 that’s correct
good yes 128 ^ 4 that’s correct I put a Big O because some like some
I put a Big O because some like some
I put a Big O because some like some like you can rearrange the ordering but
like you can rearrange the ordering but
like you can rearrange the ordering but more or less yeah that’s correct and why
more or less yeah that’s correct and why
more or less yeah that’s correct and why is this a problem well this is the
is this a problem well this is the
is this a problem well this is the problem because most of these chords are
problem because most of these chords are
problem because most of these chords are actually never seen especially after you
actually never seen especially after you
actually never seen especially after you transposed a C major a minor in fact
transposed a C major a minor in fact
transposed a C major a minor in fact looking at the data set we can see that
looking at the data set we can see that
looking at the data set we can see that just the first 20 chords or 20
just the first 20 chords or 20
just the first 20 chords or 20 notes rather occupy almost 90% of the
notes rather occupy almost 90% of the
notes rather occupy almost 90% of the entire dataset so if we were to
entire dataset so if we were to
entire dataset so if we were to represent all of these we would have a
represent all of these we would have a
represent all of these we would have a ton of symbols in our vocabulary which
ton of symbols in our vocabulary which
ton of symbols in our vocabulary which we had never seen before the way we deal
we had never seen before the way we deal
we had never seen before the way we deal with this problem is by serializing so
with this problem is by serializing so
with this problem is by serializing so that is instead of representing all four
that is instead of representing all four
that is instead of representing all four notes as an individual symbol we
notes as an individual symbol we
notes as an individual symbol we represent each individual note as a
represent each individual note as a
represent each individual note as a symbol itself and we serialized in
symbol itself and we serialized in
symbol itself and we serialized in soprano alto tenor bass order and so
soprano alto tenor bass order and so
soprano alto tenor bass order and so what you end up getting is a reduction
what you end up getting is a reduction
what you end up getting is a reduction from 128 to the 4th all possible chords
from 128 to the 4th all possible chords
from 128 to the 4th all possible chords into just 128 possible pitches now this
into just 128 possible pitches now this
into just 128 possible pitches now this may seem a little unjustified but this
may seem a little unjustified but this
may seem a little unjustified but this is actually done all the time with
is actually done all the time with
is actually done all the time with sequence processing if you took like
sequence processing if you took like
sequence processing if you took like take a look at traditional on language
take a look at traditional on language
take a look at traditional on language models you can represent them either at
models you can represent them either at
models you can represent them either at the character level or at the word level
the character level or at the word level
the character level or at the word level similarly you can represent music either
similarly you can represent music either
similarly you can represent music either at the note level or at the chord level
at the note level or at the chord level
at the note level or at the chord level after serializing the the data looks
after serializing the the data looks
after serializing the the data looks like this we have assembled a noting the
like this we have assembled a noting the
like this we have assembled a noting the start of a piece and this is used to
start of a piece and this is used to
start of a piece and this is used to initialize our model we then have the
initialize our model we then have the
initialize our model we then have the four chords soprano alto tenor bass
four chords soprano alto tenor bass
four chords soprano alto tenor bass followed by a delimiter indicating the
followed by a delimiter indicating the
followed by a delimiter indicating the end of this frame and time has advanced
end of this frame and time has advanced
end of this frame and time has advanced one in the future followed by another
one in the future followed by another
one in the future followed by another soprano alto tenor bass we also have
soprano alto tenor bass we also have
soprano alto tenor bass we also have these funny-looking dot things which I
these funny-looking dot things which I
these funny-looking dot things which I came up with to denote the self firmata
came up with to denote the self firmata
came up with to denote the self firmata so that we can encode when the end of a
so that we can encode when the end of a
so that we can encode when the end of a phrases in our input training data after
phrases in our input training data after
phrases in our input training data after all of our pre-processing our final
all of our pre-processing our final
all of our pre-processing our final corpus looks like this there’s only 108
corpus looks like this there’s only 108
corpus looks like this there’s only 108 symbols left so not a hundred all
symbols left so not a hundred all
symbols left so not a hundred all hundred 28 pitches are used in Bach’s
hundred 28 pitches are used in Bach’s
hundred 28 pitches are used in Bach’s works and there’s about I would say four
works and there’s about I would say four
works and there’s about I would say four hundred thousand total where we split
hundred thousand total where we split
hundred thousand total where we split three hundred and eighty thousand or
three hundred and eighty thousand or
three hundred and eighty thousand or three hundred and eighty thousand into a
three hundred and eighty thousand into a
three hundred and eighty thousand into a training set and forty thousand into a
training set and forty thousand into a
training set and forty thousand into a validation set we split between training
validation set we split between training
validation set we split between training and validation in order to prevent
and validation in order to prevent
and validation in order to prevent overfitting we don’t want to just
overfitting we don’t want to just
overfitting we don’t want to just memorize box Corral’s rather we want to
memorize box Corral’s rather we want to
memorize box Corral’s rather we want to be able to produce very similar samples
be able to produce very similar samples
be able to produce very similar samples which are not exact identical and that’s
which are not exact identical and that’s
which are not exact identical and that’s it with that you have the training set
it with that you have the training set
it with that you have the training set and it’s encapsulated by the first three
and it’s encapsulated by the first three
and it’s encapsulated by the first three commands on that slide I showed earlier
commands on that slide I showed earlier
commands on that slide I showed earlier with Bach
with Bach
with Bach make data set Bach bot extract
make data set Bach bot extract
make data set Bach bot extract vocabulary the next step is to train the
vocabulary the next step is to train the
vocabulary the next step is to train the recurrent neural network to talk about
recurrent neural network to talk about
recurrent neural network to talk about recurrent neural networks let’s break
recurrent neural networks let’s break
recurrent neural networks let’s break the word down recurrent neural network
the word down recurrent neural network
the word down recurrent neural network I’m going to start with neuro neural
I’m going to start with neuro neural
I’m going to start with neuro neural neural just means that we have very
neural just means that we have very
neural just means that we have very basic building blocks called neurons
basic building blocks called neurons
basic building blocks called neurons which look like this they take a
which look like this they take a
which look like this they take a d-dimensional input x1 XD these are
d-dimensional input x1 XD these are
d-dimensional input x1 XD these are numbers like 0.9 0.2 and they’re all
numbers like 0.9 0.2 and they’re all
numbers like 0.9 0.2 and they’re all added together with a linear combination
added together with a linear combination
added together with a linear combination so what you end up getting is this
so what you end up getting is this
so what you end up getting is this activation Z which is just the sum of
activation Z which is just the sum of
activation Z which is just the sum of these inputs weighted by WS so if a
these inputs weighted by WS so if a
these inputs weighted by WS so if a neuron really cares about say X 2 W 2 W
neuron really cares about say X 2 W 2 W
neuron really cares about say X 2 W 2 W 1 and the rest will be zeros and so this
1 and the rest will be zeros and so this
1 and the rest will be zeros and so this lets the neuron preferentially select
lets the neuron preferentially select
lets the neuron preferentially select which of its inputs that cares more
which of its inputs that cares more
which of its inputs that cares more about and allows to specialize for
about and allows to specialize for
about and allows to specialize for certain parts of its input this
certain parts of its input this
certain parts of its input this activation is passed through this X
activation is passed through this X
activation is passed through this X shaped thing called an on called an
shaped thing called an on called an
shaped thing called an on called an activation function commonly a sigmoid
activation function commonly a sigmoid
activation function commonly a sigmoid but all it does is it introduces a
but all it does is it introduces a
but all it does is it introduces a non-linearity into the network and
non-linearity into the network and
non-linearity into the network and allows you to explore expressive on the
allows you to explore expressive on the
allows you to explore expressive on the types of functions you can approximate
types of functions you can approximate
types of functions you can approximate and we have the output called Y you take
and we have the output called Y you take
and we have the output called Y you take these neurons you stack them
these neurons you stack them
these neurons you stack them horizontally and you get what’s called a
horizontally and you get what’s called a
horizontally and you get what’s called a lair so here I’m just showing four
lair so here I’m just showing four
lair so here I’m just showing four neurons in this layer three neurons in
neurons in this layer three neurons in
neurons in this layer three neurons in this layer two neurons on this top layer
this layer two neurons on this top layer
this layer two neurons on this top layer and I represented the network like this
and I represented the network like this
and I represented the network like this here we take the input X so this bottom
here we take the input X so this bottom
here we take the input X so this bottom part we multiply by a matrix now because
part we multiply by a matrix now because
part we multiply by a matrix now because we’ve replicated the neurons
we’ve replicated the neurons
we’ve replicated the neurons horizontally and what w’s represents the
horizontally and what w’s represents the
horizontally and what w’s represents the weights we pass it through this sigmoid
weights we pass it through this sigmoid
weights we pass it through this sigmoid activation function to get these first
activation function to get these first
activation function to get these first layer outputs this is recursively done
layer outputs this is recursively done
layer outputs this is recursively done through all the layers until you get to
through all the layers until you get to
through all the layers until you get to the very top where we have the final
the very top where we have the final
the very top where we have the final outputs of the model the W’s here the
outputs of the model the W’s here the
outputs of the model the W’s here the weights those are the parameters of the
weights those are the parameters of the
weights those are the parameters of the network and these are the things that we
network and these are the things that we
network and these are the things that we need to learn in order to train the
need to learn in order to train the
need to learn in order to train the neural network great
neural network great
neural network great we know that feed-forward neural
we know that feed-forward neural
we know that feed-forward neural networks now let’s introduce the word
networks now let’s introduce the word
networks now let’s introduce the word recurrent recurrent just means that the
recurrent recurrent just means that the
recurrent recurrent just means that the previous input or the previous hidden
previous input or the previous hidden
previous input or the previous hidden states are used in the next time step
states are used in the next time step
states are used in the next time step the prediction so what I’m showing here
the prediction so what I’m showing here
the prediction so what I’m showing here is again if you just pay attention to
is again if you just pay attention to
is again if you just pay attention to this input area
this input area
this input area and this layer right here and this
and this layer right here and this
and this layer right here and this output this part right here is the same
output this part right here is the same
output this part right here is the same thing as this thing right here however
thing as this thing right here however
thing as this thing right here however we’ve added this funny little loop
we’ve added this funny little loop
we’ve added this funny little loop coming back with this is electrical
coming back with this is electrical
coming back with this is electrical engineering notation for a unit time
engineering notation for a unit time
engineering notation for a unit time delay and what this is saying is take
delay and what this is saying is take
delay and what this is saying is take the hidden state from time T minus 1 and
the hidden state from time T minus 1 and
the hidden state from time T minus 1 and also include it as input into the next
also include it as input into the next
also include it as input into the next into the prime T predictions in
into the prime T predictions in
into the prime T predictions in equations it looks like this
equations it looks like this
equations it looks like this the current hidden state is equal to the
the current hidden state is equal to the
the current hidden state is equal to the act or the previous inputs plus the free
act or the previous inputs plus the free
act or the previous inputs plus the free or an activation of the previous inputs
or an activation of the previous inputs
or an activation of the previous inputs waited plus the the weighted activations
waited plus the the weighted activations
waited plus the the weighted activations of the previous hidden states and the
of the previous hidden states and the
of the previous hidden states and the outputs is only a function of just the
outputs is only a function of just the
outputs is only a function of just the current hidden states we can take this
current hidden states we can take this
current hidden states we can take this loop right here
loop right here
loop right here oh sorry before I go there um this is
oh sorry before I go there um this is
oh sorry before I go there um this is called a Elmen type recurrent neural
called a Elmen type recurrent neural
called a Elmen type recurrent neural network this memory cell is very basic
network this memory cell is very basic
network this memory cell is very basic it’s just doing the exact same thing a
it’s just doing the exact same thing a
it’s just doing the exact same thing a normal neural network would do it turns
normal neural network would do it turns
normal neural network would do it turns out there’s some problems with just
out there’s some problems with just
out there’s some problems with just using the basic architecture and so the
using the basic architecture and so the
using the basic architecture and so the architecture that the field has been
architecture that the field has been
architecture that the field has been converging towards is known as long
converging towards is known as long
converging towards is known as long short-term memory
short-term memory
short-term memory it looks really complicated it’s not you
it looks really complicated it’s not you
it looks really complicated it’s not you take the inputs and the hidden states
take the inputs and the hidden states
take the inputs and the hidden states and you put them into three spots right
and you put them into three spots right
and you put them into three spots right here the inputs an input gate a forget
here the inputs an input gate a forget
here the inputs an input gate a forget gate and output gate and the point of
gate and output gate and the point of
gate and output gate and the point of adding all this art complexity is to
adding all this art complexity is to
adding all this art complexity is to solve a problem known as the vanishing
solve a problem known as the vanishing
solve a problem known as the vanishing gradient problem where this constant
gradient problem where this constant
gradient problem where this constant error carousel of the hidden state being
error carousel of the hidden state being
error carousel of the hidden state being fed back to itself over and over and
fed back to itself over and over and
fed back to itself over and over and over results in signals converging
over results in signals converging
over results in signals converging toward zero or diverging to infinity
toward zero or diverging to infinity
toward zero or diverging to infinity this is fortunately this is usually
this is fortunately this is usually
this is fortunately this is usually available as just a black box
available as just a black box
available as just a black box implementation in most software packages
implementation in most software packages
implementation in most software packages you just specify I want to use an LS TM
you just specify I want to use an LS TM
you just specify I want to use an LS TM and all of this is abstracted away from
and all of this is abstracted away from
and all of this is abstracted away from you now here if you squint you can kind
you now here if you squint you can kind
you now here if you squint you can kind of see that the memory cell that I’ve
of see that the memory cell that I’ve
of see that the memory cell that I’ve shown previously where we have the
shown previously where we have the
shown previously where we have the inputs the hidden States hidden facing
inputs the hidden States hidden facing
inputs the hidden States hidden facing back to itself to generate an output I
back to itself to generate an output I
back to itself to generate an output I distract it away like this and I’ve
distract it away like this and I’ve
distract it away like this and I’ve stacked it up on top of each other so
stacked it up on top of each other so
stacked it up on top of each other so rather than just having the outputs come
rather than just having the outputs come
rather than just having the outputs come out of this H right here I’ve actually
out of this H right here I’ve actually
out of this H right here I’ve actually made it the inputs to get another memory
made it the inputs to get another memory
made it the inputs to get another memory cell
cell
cell this is where the word deep comes from
this is where the word deep comes from
this is where the word deep comes from deep networks are just networks that
deep networks are just networks that
deep networks are just networks that have a lot of layers and by stacking I
have a lot of layers and by stacking I
have a lot of layers and by stacking I get to use the word deep inside of my
get to use the word deep inside of my
get to use the word deep inside of my deep LS TM model but I’ll show you later
deep LS TM model but I’ll show you later
deep LS TM model but I’ll show you later that I’m not just doing it for the
that I’m not just doing it for the
that I’m not just doing it for the buzzword depth actually matters as well
buzzword depth actually matters as well
buzzword depth actually matters as well see in results another operation that’s
see in results another operation that’s
see in results another operation that’s important for LS CMS is unrolling and
important for LS CMS is unrolling and
important for LS CMS is unrolling and what unrolling does is it takes this
what unrolling does is it takes this
what unrolling does is it takes this unit time delay and it just replicates
unit time delay and it just replicates
unit time delay and it just replicates the LS TM units over time so rather than
the LS TM units over time so rather than
the LS TM units over time so rather than show in this delay like this I’ve taken
show in this delay like this I’ve taken
show in this delay like this I’ve taken it I’ve shown the the – once hidden unit
it I’ve shown the the – once hidden unit
it I’ve shown the the – once hidden unit passing state into the the t hidden unit
passing state into the the t hidden unit
passing state into the the t hidden unit passing stages the T plus first hidden
passing stages the T plus first hidden
passing stages the T plus first hidden unit your input is a variable length and
unit your input is a variable length and
unit your input is a variable length and to train the network what you do is you
to train the network what you do is you
to train the network what you do is you expand this graph you unroll the lsdm so
expand this graph you unroll the lsdm so
expand this graph you unroll the lsdm so the same length as your variable length
the same length as your variable length
the same length as your variable length input and in order to get these
input and in order to get these
input and in order to get these predictions up at the top great we know
predictions up at the top great we know
predictions up at the top great we know all we need to know about music and rnns
all we need to know about music and rnns
all we need to know about music and rnns let’s move on to a Bach bot have Bach
let’s move on to a Bach bot have Bach
let’s move on to a Bach bot have Bach Bach works to Train Bach bot we apply
Bach works to Train Bach bot we apply
Bach works to Train Bach bot we apply sequential prediction criteria now I’ve
sequential prediction criteria now I’ve
sequential prediction criteria now I’ve stolen this from Andre carpet thieves
stolen this from Andre carpet thieves
stolen this from Andre carpet thieves github but the principles are the same
github but the principles are the same
github but the principles are the same suppose we’re given the input characters
suppose we’re given the input characters
suppose we’re given the input characters hello and we want to model it using a
hello and we want to model it using a
hello and we want to model it using a recurrent neural network the training
recurrent neural network the training
recurrent neural network the training criteria is given the current input
criteria is given the current input
criteria is given the current input character and the previous hidden state
character and the previous hidden state
character and the previous hidden state predicts the next character so notice
predicts the next character so notice
predicts the next character so notice down here I have a CH and I’m trying to
down here I have a CH and I’m trying to
down here I have a CH and I’m trying to predict e I’ve e and I’m trying to
predict e I’ve e and I’m trying to
predict e I’ve e and I’m trying to predict L I’ve L and I’m trying to
predict L I’ve L and I’m trying to
predict L I’ve L and I’m trying to predict L and I have Allen I’m trying to
predict L and I have Allen I’m trying to
predict L and I have Allen I’m trying to predict oh if we take this analogy to
predict oh if we take this analogy to
predict oh if we take this analogy to music I have all of the notes I’ve seen
music I have all of the notes I’ve seen
music I have all of the notes I’ve seen up until this point in time and I’m
up until this point in time and I’m
up until this point in time and I’m trying to predict the next note I can
trying to predict the next note I can
trying to predict the next note I can iterate this process forwards to
iterate this process forwards to
iterate this process forwards to generate compositions the criteria we
generate compositions the criteria we
generate compositions the criteria we want to use is and so the output layer
want to use is and so the output layer
want to use is and so the output layer here is actually a probability
here is actually a probability
here is actually a probability distribution sorry so take in the
distribution sorry so take in the
distribution sorry so take in the previous slide and now I put it on top
previous slide and now I put it on top
previous slide and now I put it on top of my unrolled Network so given the
of my unrolled Network so given the
of my unrolled Network so given the initial hidden state which we just
initial hidden state which we just
initial hidden state which we just initialized all zeroes because we have a
initialized all zeroes because we have a
initialized all zeroes because we have a unique start symbol used to initialize
unique start symbol used to initialize
unique start symbol used to initialize our pieces and the RNN dynamics so this
our pieces and the RNN dynamics so this
our pieces and the RNN dynamics so this is the probability distribution over the
is the probability distribution over the
is the probability distribution over the next state given the current state
next state given the current state
next state given the current state so this YT is
so this YT is
so this YT is for that and it’s a function of the
for that and it’s a function of the
for that and it’s a function of the currents the current input XT as well as
currents the current input XT as well as
currents the current input XT as well as the previous hidden states from t minus
the previous hidden states from t minus
the previous hidden states from t minus 1 we need to choose the r and n
1 we need to choose the r and n
1 we need to choose the r and n parameters so these weight matrices the
parameters so these weight matrices the
parameters so these weight matrices the weights of all the connections between
weights of all the connections between
weights of all the connections between all the neurons in order to maximize
all the neurons in order to maximize
all the neurons in order to maximize this probability right here the
this probability right here the
this probability right here the probability of the real Bach chorale so
probability of the real Bach chorale so
probability of the real Bach chorale so down here we have all the notes of the
down here we have all the notes of the
down here we have all the notes of the real Bach chorale and up here we have
real Bach chorale and up here we have
real Bach chorale and up here we have the next notes of this of those in an
the next notes of this of those in an
the next notes of this of those in an ideal world if we just initialize it
ideal world if we just initialize it
ideal world if we just initialize it with some Bach chorale it’ll just
with some Bach chorale it’ll just
with some Bach chorale it’ll just memorize and return the remainder and
memorize and return the remainder and
memorize and return the remainder and that will that will do great on this
that will that will do great on this
that will that will do great on this prediction criteria but that’s not
prediction criteria but that’s not
prediction criteria but that’s not exactly what we want but nevertheless
exactly what we want but nevertheless
exactly what we want but nevertheless once we have this criteria the way that
once we have this criteria the way that
once we have this criteria the way that the model is actually trained is by
the model is actually trained is by
the model is actually trained is by using the chain rule from calculus where
using the chain rule from calculus where
using the chain rule from calculus where we take partial derivatives up here we
we take partial derivatives up here we
we take partial derivatives up here we have an error signal so I know this is
have an error signal so I know this is
have an error signal so I know this is the real Bach note the real note that
the real Bach note the real note that
the real Bach note the real note that Bach used and this is the thing my model
Bach used and this is the thing my model
Bach used and this is the thing my model is predicting ok they’re a little bit
is predicting ok they’re a little bit
is predicting ok they’re a little bit different how do I change the parameters
different how do I change the parameters
different how do I change the parameters this weight matrix between the hidden
this weight matrix between the hidden
this weight matrix between the hidden state the outputs this weight matrix
state the outputs this weight matrix
state the outputs this weight matrix between the previous in stay in the
between the previous in stay in the
between the previous in stay in the current hidden state and this weight
current hidden state and this weight
current hidden state and this weight matrix between the hidden state the
matrix between the hidden state the
matrix between the hidden state the inputs how can I change those around how
inputs how can I change those around how
inputs how can I change those around how do I wiggle those to make this output up
do I wiggle those to make this output up
do I wiggle those to make this output up here closer to what Bach actually had
here closer to what Bach actually had
here closer to what Bach actually had produced now this training criteria can
produced now this training criteria can
produced now this training criteria can be just formalized
be just formalized
be just formalized used by taking gradients using calculus
used by taking gradients using calculus
used by taking gradients using calculus and iterating and then optimization
and iterating and then optimization
and iterating and then optimization known as stochastic gradient descents
known as stochastic gradient descents
known as stochastic gradient descents and when applied to neural networks it’s
and when applied to neural networks it’s
and when applied to neural networks it’s an algorithm called back propagation
an algorithm called back propagation
an algorithm called back propagation well back propagation through time if
well back propagation through time if
well back propagation through time if you want to get nitty-gritty because
you want to get nitty-gritty because
you want to get nitty-gritty because we’ve unrolled the neural network over
we’ve unrolled the neural network over
we’ve unrolled the neural network over time but again this is also abstraction
time but again this is also abstraction
time but again this is also abstraction that need not concern you because this
that need not concern you because this
that need not concern you because this is also usually provided for you as a
is also usually provided for you as a
is also usually provided for you as a black box inside of common frameworks
black box inside of common frameworks
black box inside of common frameworks such as tensor flow and caris we now
such as tensor flow and caris we now
such as tensor flow and caris we now have all we now have the Bach bot model
have all we now have the Bach bot model
have all we now have the Bach bot model but there’s a couple parameters that we
but there’s a couple parameters that we
but there’s a couple parameters that we need to look at I haven’t told you
need to look at I haven’t told you
need to look at I haven’t told you exactly how deep Bach bot is nor have I
exactly how deep Bach bot is nor have I
exactly how deep Bach bot is nor have I told you how big these layers are before
told you how big these layers are before
told you how big these layers are before we start when optimizing models this is
we start when optimizing models this is
we start when optimizing models this is this is a very important learning and
this is a very important learning and
this is a very important learning and it’s probably obvious by now GPUs are
it’s probably obvious by now GPUs are
it’s probably obvious by now GPUs are very important for rapid experimentation
very important for rapid experimentation
very important for rapid experimentation I did a quick benchmark and I found that
I did a quick benchmark and I found that
I did a quick benchmark and I found that a GPU delivers an 8x perform
a GPU delivers an 8x perform
a GPU delivers an 8x perform speed up making my training time goes
speed up making my training time goes
speed up making my training time goes down from 256 minutes down to just 28
down from 256 minutes down to just 28
down from 256 minutes down to just 28 minutes so if you want to iterate
minutes so if you want to iterate
minutes so if you want to iterate quickly getting a GPU will save you
quickly getting a GPU will save you
quickly getting a GPU will save you April like will make you eight times
April like will make you eight times
April like will make you eight times more productive did I just put the word
more productive did I just put the word
more productive did I just put the word deep onto my neural network because it
deep onto my neural network because it
deep onto my neural network because it was a good buzz word it turns out no
was a good buzz word it turns out no
was a good buzz word it turns out no depth actually matters what I’m showing
depth actually matters what I’m showing
depth actually matters what I’m showing you here are the training losses as well
you here are the training losses as well
you here are the training losses as well as the validation losses as I change the
as the validation losses as I change the
as the validation losses as I change the depth the training loss is how well is
depth the training loss is how well is
depth the training loss is how well is my model doing on the training data set
my model doing on the training data set
my model doing on the training data set which I’m letting it see and letting it
which I’m letting it see and letting it
which I’m letting it see and letting it tune its parameters to do better on and
tune its parameters to do better on and
tune its parameters to do better on and the validation loss is how well is my
the validation loss is how well is my
the validation loss is how well is my model doing on data that I didn’t let it
model doing on data that I didn’t let it
model doing on data that I didn’t let it see so how well is it generalizing
see so how well is it generalizing
see so how well is it generalizing beyond just memorizing its inputs and
beyond just memorizing its inputs and
beyond just memorizing its inputs and what we notice here is that with just
what we notice here is that with just
what we notice here is that with just one layer the validation error is quite
one layer the validation error is quite
one layer the validation error is quite high and as we increase layers – it gets
high and as we increase layers – it gets
high and as we increase layers – it gets you down here three gets you this red
you down here three gets you this red
you down here three gets you this red curve which is as low as it goes and if
curve which is as low as it goes and if
curve which is as low as it goes and if you keep going for with four it goes
you keep going for with four it goes
you keep going for with four it goes back up should this be surprising it
back up should this be surprising it
back up should this be surprising it shouldn’t and the reason why it
shouldn’t and the reason why it
shouldn’t and the reason why it shouldn’t is because as you add more
shouldn’t is because as you add more
shouldn’t is because as you add more layers you’re adding more expressive
layers you’re adding more expressive
layers you’re adding more expressive power notice that we’re here with four
power notice that we’re here with four
power notice that we’re here with four layers you’re actually doing just as
layers you’re actually doing just as
layers you’re actually doing just as good as the red curve so you’re doing
good as the red curve so you’re doing
good as the red curve so you’re doing great on the training set but because
great on the training set but because
great on the training set but because your model is now so expressive you’re
your model is now so expressive you’re
your model is now so expressive you’re memorizing the inputs and so you
memorizing the inputs and so you
memorizing the inputs and so you generalize more poorly so a similar
generalize more poorly so a similar
generalize more poorly so a similar story can be told about the hidden state
story can be told about the hidden state
story can be told about the hidden state sighs so how wide those memory cells are
sighs so how wide those memory cells are
sighs so how wide those memory cells are how many units do we have in them as we
how many units do we have in them as we
how many units do we have in them as we increase the hidden state layer it’s
increase the hidden state layer it’s
increase the hidden state layer it’s hidden state size we get performance
hidden state size we get performance
hidden state size we get performance improvements in generalization from this
improvements in generalization from this
improvements in generalization from this blue curve all the way down until we get
blue curve all the way down until we get
blue curve all the way down until we get to 256 hidden units this green curve
to 256 hidden units this green curve
to 256 hidden units this green curve after that we see the same kind of
after that we see the same kind of
after that we see the same kind of behavior where the training set error
behavior where the training set error
behavior where the training set error goes lower and lower but because you’re
goes lower and lower but because you’re
goes lower and lower but because you’re memorizing the inputs because your model
memorizing the inputs because your model
memorizing the inputs because your model is now too powerful you’re out your
is now too powerful you’re out your
is now too powerful you’re out your generalization error actually gets worse
finally LST em they’re pretty
finally LST em they’re pretty complicated the reason why I introduced
complicated the reason why I introduced
complicated the reason why I introduced it is because it’s actually so critical
it is because it’s actually so critical
it is because it’s actually so critical for your performance the the basic
for your performance the the basic
for your performance the the basic element type recurrent neural network or
element type recurrent neural network or
element type recurrent neural network or just reuses the standard recurrent
just reuses the standard recurrent
just reuses the standard recurrent neural network architecture for the
neural network architecture for the
neural network architecture for the memory cell is shown here in
memory cell is shown here in
memory cell is shown here in side of this green curve right here
side of this green curve right here
side of this green curve right here which actually doesn’t do to both too
which actually doesn’t do to both too
which actually doesn’t do to both too badly but by using a long short term
badly but by using a long short term
badly but by using a long short term memory you get this yellow curve which
memory you get this yellow curve which
memory you get this yellow curve which is at the very bottom it’s doing as best
is at the very bottom it’s doing as best
is at the very bottom it’s doing as best as out of all the architectures we
as out of all the architectures we
as out of all the architectures we looked at in terms of memory cells gated
looked at in terms of memory cells gated
looked at in terms of memory cells gated recurrent units are ass more simpler or
recurrent units are ass more simpler or
recurrent units are ass more simpler or simpler generalization of LF CMS they
simpler generalization of LF CMS they
simpler generalization of LF CMS they haven’t been used as much and so there’s
haven’t been used as much and so there’s
haven’t been used as much and so there’s less literature about them but on this
less literature about them but on this
less literature about them but on this task they also appear to be doing quite
task they also appear to be doing quite
task they also appear to be doing quite well cool after all of this
well cool after all of this
well cool after all of this experimentation and all of this manual
experimentation and all of this manual
experimentation and all of this manual grid search we finally arrived at a
grid search we finally arrived at a
grid search we finally arrived at a final architecture where notes are first
final architecture where notes are first
final architecture where notes are first embedded into real numbers a 32
embedded into real numbers a 32
embedded into real numbers a 32 dimensional real or vector rather and
dimensional real or vector rather and
dimensional real or vector rather and then we have a three layer stacked
then we have a three layer stacked
then we have a three layer stacked long short term memory recurrent neural
long short term memory recurrent neural
long short term memory recurrent neural network which processes these notes
network which processes these notes
network which processes these notes sequences over time and we trained it
sequences over time and we trained it
sequences over time and we trained it using standard gradient descent with a
using standard gradient descent with a
using standard gradient descent with a couple tricks we use this thing called
couple tricks we use this thing called
couple tricks we use this thing called drop out and we drop out with a setting
drop out and we drop out with a setting
drop out and we drop out with a setting of 30% and what this means is in between
of 30% and what this means is in between
of 30% and what this means is in between subsequent connections between layers
subsequent connections between layers
subsequent connections between layers randomly turns 30% of the neurons off
randomly turns 30% of the neurons off
randomly turns 30% of the neurons off that seems a little bit counterintuitive
that seems a little bit counterintuitive
that seems a little bit counterintuitive why might you want to do that it turns
why might you want to do that it turns
why might you want to do that it turns out by turning off neurons during
out by turning off neurons during
out by turning off neurons during training you actually force the neurons
training you actually force the neurons
training you actually force the neurons to learn more robust features that are
to learn more robust features that are
to learn more robust features that are independent of each other
independent of each other
independent of each other if the neurons are not always reliably
if the neurons are not always reliably
if the neurons are not always reliably avail if those connections are not
avail if those connections are not
avail if those connections are not always reliably available then there are
always reliably available then there are
always reliably available then there are always reliably available then neurons
always reliably available then neurons
always reliably available then neurons may learn that to combine these two
may learn that to combine these two
may learn that to combine these two features and to happen so you end up
features and to happen so you end up
features and to happen so you end up getting correlated features where to
getting correlated features where to
getting correlated features where to newer ons are actually learning the
newer ons are actually learning the
newer ons are actually learning the exact same feature with dropout we’re
exact same feature with dropout we’re
exact same feature with dropout we’re able we will actually show in the next
able we will actually show in the next
able we will actually show in the next slide that generalization improves as we
slide that generalization improves as we
slide that generalization improves as we increase this number to a certain point
increase this number to a certain point
increase this number to a certain point we also conduct something called
we also conduct something called
we also conduct something called Bachelor Malaysian it basically just
Bachelor Malaysian it basically just
Bachelor Malaysian it basically just takes your data and centers it back
takes your data and centers it back
takes your data and centers it back around zero and rescales the variance so
around zero and rescales the variance so
around zero and rescales the variance so that you don’t have to worry about
that you don’t have to worry about
that you don’t have to worry about floating-point number overflows or under
floating-point number overflows or under
floating-point number overflows or under flows and we use 128 kind step truncated
flows and we use 128 kind step truncated
flows and we use 128 kind step truncated back propagation through time again
back propagation through time again
back propagation through time again another thing that your optimizer will
another thing that your optimizer will
another thing that your optimizer will handle for you but at a high level what
handle for you but at a high level what
handle for you but at a high level what this is doing is rather than unrolling
this is doing is rather than unrolling
this is doing is rather than unrolling the entire network which over the entire
the entire network which over the entire
the entire network which over the entire input sequence which could be tens of
input sequence which could be tens of
input sequence which could be tens of thousands of notes long
thousands of notes long
thousands of notes long got tens of thousands thousands of notes
got tens of thousands thousands of notes
got tens of thousands thousands of notes long we only unroll it 128 and we
long we only unroll it 128 and we
long we only unroll it 128 and we truncate the air signals we basically
truncate the air signals we basically
truncate the air signals we basically say after 120 time steps whatever you do
say after 120 time steps whatever you do
say after 120 time steps whatever you do over here is not going to affect the
over here is not going to affect the
over here is not going to affect the future
future
future too much here’s my promise slide about
too much here’s my promise slide about
too much here’s my promise slide about drop out counter-intuitively as we turn
drop out counter-intuitively as we turn
drop out counter-intuitively as we turn that as we start dropping out or turning
that as we start dropping out or turning
that as we start dropping out or turning off random neurons or random neuron
off random neurons or random neuron
off random neurons or random neuron connections we actually generalize
connections we actually generalize
connections we actually generalize better we see that without drop out the
better we see that without drop out the
better we see that without drop out the model actually starts to overfit
model actually starts to overfit
model actually starts to overfit dramatically you know it gets better at
dramatically you know it gets better at
dramatically you know it gets better at generalizing that it gets worse and
generalizing that it gets worse and
generalizing that it gets worse and worse and worse at generalizing because
worse and worse at generalizing because
worse and worse at generalizing because it’s got so many connections it can
it’s got so many connections it can
it’s got so many connections it can learn so much you turn to and drop out
learn so much you turn to and drop out
learn so much you turn to and drop out up to 0.3 you get this purple curve at
up to 0.3 you get this purple curve at
up to 0.3 you get this purple curve at the bottom where you’ve turned just to
the bottom where you’ve turned just to
the bottom where you’ve turned just to the right amount so that the features
the right amount so that the features
the right amount so that the features the model of learning are robust they
the model of learning are robust they
the model of learning are robust they can generalize independently of other
can generalize independently of other
can generalize independently of other features and if you turn it up too high
features and if you turn it up too high
features and if you turn it up too high then now you’re dropping up so much
then now you’re dropping up so much
then now you’re dropping up so much you’re injecting more noise than
you’re injecting more noise than
you’re injecting more noise than regularizing your model you actually
regularizing your model you actually
regularizing your model you actually don’t generalize that well and the story
don’t generalize that well and the story
don’t generalize that well and the story on the training side is also consistent
on the training side is also consistent
on the training side is also consistent as we increase dropout you do strictly
as we increase dropout you do strictly
as we increase dropout you do strictly worse on training and that makes sense
worse on training and that makes sense
worse on training and that makes sense too because this isn’t generalization
too because this isn’t generalization
too because this isn’t generalization this is just how well can the model
this is just how well can the model
this is just how well can the model memorize its input data and if you turn
memorize its input data and if you turn
memorize its input data and if you turn inputs off you will memorize this good
great with the Train model we can do
great with the Train model we can do many things we can compose and we can
many things we can compose and we can
many things we can compose and we can harmonize and the way we compose is the
harmonize and the way we compose is the
harmonize and the way we compose is the following we have the hidden states and
following we have the hidden states and
following we have the hidden states and we have the inputs and we have the model
we have the inputs and we have the model
we have the inputs and we have the model weights and so we can use the model
weights and so we can use the model
weights and so we can use the model weights to form this predictive
weights to form this predictive
weights to form this predictive distribution what is the probability of
distribution what is the probability of
distribution what is the probability of my current note given all of the
my current note given all of the
my current note given all of the previous notes I’ve seen before from
previous notes I’ve seen before from
previous notes I’ve seen before from this probability distribution we just
this probability distribution we just
this probability distribution we just written we pick out a note according to
written we pick out a note according to
written we pick out a note according to how that distribution is parameterised
how that distribution is parameterised
how that distribution is parameterised so up here this could be like I think L
so up here this could be like I think L
so up here this could be like I think L has the highest weight here and then so
has the highest weight here and then so
has the highest weight here and then so after we sample it we just set XT equal
after we sample it we just set XT equal
after we sample it we just set XT equal to whatever we sampled out of there and
to whatever we sampled out of there and
to whatever we sampled out of there and we just treat it as truth we just assume
we just treat it as truth we just assume
we just treat it as truth we just assume that whatever the output was right there
that whatever the output was right there
that whatever the output was right there is now the input for the next time step
is now the input for the next time step
is now the input for the next time step and then we iterate this process for
and then we iterate this process for
and then we iterate this process for words so starting with no notes at all
words so starting with no notes at all
words so starting with no notes at all you sample the start symbol and then you
you sample the start symbol and then you
you sample the start symbol and then you just keep going until you sample the end
just keep going until you sample the end
just keep going until you sample the end symbol and then
symbol and then
symbol and then that way we’re able to generate novel
that way we’re able to generate novel
that way we’re able to generate novel automatic compositions harmonization is
automatic compositions harmonization is
automatic compositions harmonization is actually a generalization of composition
actually a generalization of composition
actually a generalization of composition in composition what we basically did was
in composition what we basically did was
in composition what we basically did was I got a start symbol fill in the rest
I got a start symbol fill in the rest
I got a start symbol fill in the rest harmonization is where you say I’ve got
harmonization is where you say I’ve got
harmonization is where you say I’ve got the melody I’ve got the baseline or I’ve
the melody I’ve got the baseline or I’ve
the melody I’ve got the baseline or I’ve got these certain notes fill in the
got these certain notes fill in the
got these certain notes fill in the parts that I didn’t specify and for this
parts that I didn’t specify and for this
parts that I didn’t specify and for this we actually proposed a suboptimal
we actually proposed a suboptimal
we actually proposed a suboptimal strategy so I’m going to let alpha
strategy so I’m going to let alpha
strategy so I’m going to let alpha denote the stuff that we’re given so it
denote the stuff that we’re given so it
denote the stuff that we’re given so it alpha could be like 1 3 7 the points in
alpha could be like 1 3 7 the points in
alpha could be like 1 3 7 the points in time where the notes are fixed and the
time where the notes are fixed and the
time where the notes are fixed and the privatization problem is we need to
privatization problem is we need to
privatization problem is we need to choose the notes that aren’t fixed or we
choose the notes that aren’t fixed or we
choose the notes that aren’t fixed or we subdues the input the sequence X 1 to X
subdues the input the sequence X 1 to X
subdues the input the sequence X 1 to X also we need to choose the entire
also we need to choose the entire
also we need to choose the entire composition such that the notes that
composition such that the notes that
composition such that the notes that we’re given X alpha are already fixed
we’re given X alpha are already fixed
we’re given X alpha are already fixed and so our decision variables are the
and so our decision variables are the
and so our decision variables are the things that are not in alpha and we need
things that are not in alpha and we need
things that are not in alpha and we need to maximize this probability
to maximize this probability
to maximize this probability distribution my kind of greedy solution
distribution my kind of greedy solution
distribution my kind of greedy solution which I’ve received a lot of criticism
which I’ve received a lot of criticism
which I’ve received a lot of criticism for is okay you’re at this point in time
for is okay you’re at this point in time
for is okay you’re at this point in time just sample the the most likely thing at
just sample the the most likely thing at
just sample the the most likely thing at the next point in time the reason why
the next point in time the reason why
the next point in time the reason why this gets criticized is because if you
this gets criticized is because if you
this gets criticized is because if you greedily choose without looking at what
greedily choose without looking at what
greedily choose without looking at what influence this decision now could impact
influence this decision now could impact
influence this decision now could impact on your future you might choose
on your future you might choose
on your future you might choose something that just doesn’t make any
something that just doesn’t make any
something that just doesn’t make any sense in the future harmonic context but
sense in the future harmonic context but
sense in the future harmonic context but may sound really good right now it’s
may sound really good right now it’s
may sound really good right now it’s kind of like thinking it’s kind of like
kind of like thinking it’s kind of like
kind of like thinking it’s kind of like acting without thinking about the
acting without thinking about the
acting without thinking about the consequences of your action but the
consequences of your action but the
consequences of your action but the testament to how well this actually
testament to how well this actually
testament to how well this actually performs is not what could it how bad
performs is not what could it how bad
performs is not what could it how bad could it be theoretically it’s actually
could it be theoretically it’s actually
could it be theoretically it’s actually how well does it do empirically is this
how well does it do empirically is this
how well does it do empirically is this still convincing and we’ll find out soon
but before we go there let’s uncover the
but before we go there let’s uncover the black box I’ve been talking about neural
black box I’ve been talking about neural
black box I’ve been talking about neural networks is just this thing which you
networks is just this thing which you
networks is just this thing which you can just optimize throw data at it it’ll
can just optimize throw data at it it’ll
can just optimize throw data at it it’ll learn things let’s take a look inside
learn things let’s take a look inside
learn things let’s take a look inside and see what’s actually going on and so
and see what’s actually going on and so
and see what’s actually going on and so what I’ve done here is I’ve taken the
what I’ve done here is I’ve taken the
what I’ve done here is I’ve taken the various memory cells of my recurrent
various memory cells of my recurrent
various memory cells of my recurrent neural network and I’ve unrolled it over
neural network and I’ve unrolled it over
neural network and I’ve unrolled it over time so on the x axis you see time and
time so on the x axis you see time and
time so on the x axis you see time and on the y axis I’m showing you the
on the y axis I’m showing you the
on the y axis I’m showing you the activations of all of the hidden units
activations of all of the hidden units
activations of all of the hidden units so this is like neuron number
so this is like neuron number
so this is like neuron number one tuner on number 32 this is neuron
one tuner on number 32 this is neuron
one tuner on number 32 this is neuron number one – neuron number 256 in the
number one – neuron number 256 in the
number one – neuron number 256 in the first hidden layer and similarly this is
first hidden layer and similarly this is
first hidden layer and similarly this is neuron number one – neuron number 256 in
neuron number one – neuron number 256 in
neuron number one – neuron number 256 in the second hidden layer these any
the second hidden layer these any
the second hidden layer these any pattern there I don’t I mean I kind of
pattern there I don’t I mean I kind of
pattern there I don’t I mean I kind of do I see like there’s like this little
do I see like there’s like this little
do I see like there’s like this little smear right here and it seems to show up
smear right here and it seems to show up
smear right here and it seems to show up everywhere as well as right here but
everywhere as well as right here but
everywhere as well as right here but there’s not too much intuitive sense
there’s not too much intuitive sense
there’s not too much intuitive sense that I can make out of this image and
that I can make out of this image and
that I can make out of this image and this is a common criticism of deep
this is a common criticism of deep
this is a common criticism of deep neural networks they’re like black boxes
neural networks they’re like black boxes
neural networks they’re like black boxes where we don’t know how they really work
where we don’t know how they really work
where we don’t know how they really work on the inside but they seem to do
on the inside but they seem to do
on the inside but they seem to do awfully good as we get closer to the
awfully good as we get closer to the
awfully good as we get closer to the output things start to make a little bit
output things start to make a little bit
output things start to make a little bit more sense so over so I previously was
more sense so over so I previously was
more sense so over so I previously was showing the hidden units of the first
showing the hidden units of the first
showing the hidden units of the first and second layer now I’m showing the
and second layer now I’m showing the
and second layer now I’m showing the third layer as well as a linear
third layer as well as a linear
third layer as well as a linear combination of the third layer and
combination of the third layer and
combination of the third layer and finally the outputs of the model and as
finally the outputs of the model and as
finally the outputs of the model and as you get towards the end you start seeing
you get towards the end you start seeing
you get towards the end you start seeing oh there’s this little dotty pattern
oh there’s this little dotty pattern
oh there’s this little dotty pattern this almost looks like a piano roll if
this almost looks like a piano roll if
this almost looks like a piano roll if you remember the representation of music
you remember the representation of music
you remember the representation of music I showed earlier where we had time on
I showed earlier where we had time on
I showed earlier where we had time on the x-axis and pitch on the y-axis this
the x-axis and pitch on the y-axis this
the x-axis and pitch on the y-axis this looks awfully similar to that and this
looks awfully similar to that and this
looks awfully similar to that and this isn’t surprising either recall we
isn’t surprising either recall we
isn’t surprising either recall we trained the neural network to predict
trained the neural network to predict
trained the neural network to predict the next note given the current note or
the next note given the current note or
the next note given the current note or all the previous notes if the network
all the previous notes if the network
all the previous notes if the network was doing perfectly we would expect to
was doing perfectly we would expect to
was doing perfectly we would expect to just see the input here delayed by a
just see the input here delayed by a
just see the input here delayed by a single time step and so it’s
single time step and so it’s
single time step and so it’s unsurprising that we do see something
unsurprising that we do see something
unsurprising that we do see something that resembles the input but it’s not
that resembles the input but it’s not
that resembles the input but it’s not quite exactly the input sometimes we see
quite exactly the input sometimes we see
quite exactly the input sometimes we see like multiple predictions at one point
like multiple predictions at one point
like multiple predictions at one point in time and this is really representing
in time and this is really representing
in time and this is really representing the uncertainty inside of our
the uncertainty inside of our
the uncertainty inside of our predictions so if I represented the
predictions so if I represented the
predictions so if I represented the probability distribution we’re not just
probability distribution we’re not just
probability distribution we’re not just saying the next note is then is this
saying the next note is then is this
saying the next note is then is this rather we’re saying we’re pretty sure
rather we’re saying we’re pretty sure
rather we’re saying we’re pretty sure than that next note is this with this
than that next note is this with this
than that next note is this with this probability but it could also be this
probability but it could also be this
probability but it could also be this with this probability that probability I
with this probability that probability I
with this probability that probability I called this the probabilistic piano roll
called this the probabilistic piano roll
called this the probabilistic piano roll I don’t know if that’s standard
I don’t know if that’s standard
I don’t know if that’s standard terminology here’s one of my most
terminology here’s one of my most
terminology here’s one of my most interesting insights that I found from
interesting insights that I found from
interesting insights that I found from this model it appears to actually be
this model it appears to actually be
this model it appears to actually be learning music theory concepts so what
learning music theory concepts so what
learning music theory concepts so what I’m showing here is some input that I
I’m showing here is some input that I
I’m showing here is some input that I provided to the model and here I picked
provided to the model and here I picked
provided to the model and here I picked out some neurons and oh no these neurons
out some neurons and oh no these neurons
out some neurons and oh no these neurons are randomly selected so I didn’t just
are randomly selected so I didn’t just
are randomly selected so I didn’t just go and I fished for the ones that
go and I fished for the ones that
go and I fished for the ones that like that rather I just ran a random
like that rather I just ran a random
like that rather I just ran a random number generator got eight of them out
number generator got eight of them out
number generator got eight of them out and then I handed them off to my music
and then I handed them off to my music
and then I handed them off to my music dearest collaborator and I was like hey
dearest collaborator and I was like hey
dearest collaborator and I was like hey is there anything there and here’s the
is there anything there and here’s the
is there anything there and here’s the end here’s the notes he made for me
end here’s the notes he made for me
end here’s the notes he made for me he said that neuron 64 this one and
he said that neuron 64 this one and
he said that neuron 64 this one and layer one neuron 138 this one they
layer one neuron 138 this one they
layer one neuron 138 this one they appear to be picking out perfect
appear to be picking out perfect
appear to be picking out perfect Cadence’s with root position chords in
Cadence’s with root position chords in
Cadence’s with root position chords in the tonic key more music theory than I
the tonic key more music theory than I
the tonic key more music theory than I can understand but if I look up here
can understand but if I look up here
can understand but if I look up here it’s like that shape right there on the
it’s like that shape right there on the
it’s like that shape right there on the piano roll looks like that shape on the
piano roll looks like that shape on the
piano roll looks like that shape on the piano roll looks like that shape on the
piano roll looks like that shape on the
piano roll looks like that shape on the piano roll interesting neuron layer one
piano roll interesting neuron layer one
piano roll interesting neuron layer one or neuron 151 I believe that is this one
or neuron 151 I believe that is this one
or neuron 151 I believe that is this one a minor Cadence’s ending phrases two and
a minor Cadence’s ending phrases two and
a minor Cadence’s ending phrases two and four no that’s this one sorry and and
four no that’s this one sorry and and
four no that’s this one sorry and and again I look up here okay yeah that kind
again I look up here okay yeah that kind
again I look up here okay yeah that kind of chord right there looks kind of like
of chord right there looks kind of like
of chord right there looks kind of like that chord right there they seem to be
that chord right there they seem to be
that chord right there they seem to be specializing to picking out specific
specializing to picking out specific
specializing to picking out specific types of chords okay so it’s learning
types of chords okay so it’s learning
types of chords okay so it’s learning Roman numeral analysis and tonics and
Roman numeral analysis and tonics and
Roman numeral analysis and tonics and root position chords and Cadence’s and
root position chords and Cadence’s and
root position chords and Cadence’s and the last one where one neuron eighty
the last one where one neuron eighty
the last one where one neuron eighty seven and layer two neuron 37 I believe
seven and layer two neuron 37 I believe
seven and layer two neuron 37 I believe that’s this one in this one they’re
that’s this one in this one they’re
that’s this one in this one they’re picking out I six chords I have no idea
picking out I six chords I have no idea
picking out I six chords I have no idea what that means
so I showed you automatic composition at
so I showed you automatic composition at the beginning of the presentation when I
the beginning of the presentation when I
the beginning of the presentation when I took some Bach Bach music and I
took some Bach Bach music and I
took some Bach Bach music and I allegedly claimed it was Bach I’ll now
allegedly claimed it was Bach I’ll now
allegedly claimed it was Bach I’ll now show you what harmonization sounds like
show you what harmonization sounds like
show you what harmonization sounds like and this is with the sub optimal
and this is with the sub optimal
and this is with the sub optimal strategy that I proposed so we take a
strategy that I proposed so we take a
strategy that I proposed so we take a melody such as
melody such as
melody such as [Music]
[Music]
[Music] we tell the model this has to be the
we tell the model this has to be the
we tell the model this has to be the soprano line what are the others likely
soprano line what are the others likely
soprano line what are the others likely to be like that’s kind of convincing
to be like that’s kind of convincing
to be like that’s kind of convincing it’s almost like a baroque C major chord
it’s almost like a baroque C major chord
it’s almost like a baroque C major chord progression what’s really interesting
progression what’s really interesting
progression what’s really interesting though is that not only can we just
though is that not only can we just
though is that not only can we just harmonize simple melodies like that we
harmonize simple melodies like that we
harmonize simple melodies like that we can actually take popular tunes such as
can actually take popular tunes such as
can actually take popular tunes such as this
this
this [Music]
we can generate a novel baroque
we can generate a novel baroque harmonization of what Bach might have
harmonization of what Bach might have
harmonization of what Bach might have done had he heard twinkle twinkle little
done had he heard twinkle twinkle little
done had he heard twinkle twinkle little star during his lifetime
now I’m going off the track where it’s
now I’m going off the track where it’s like oh this is my model it looks so
like oh this is my model it looks so
like oh this is my model it looks so good it sounds so realistic yeah but I
good it sounds so realistic yeah but I
good it sounds so realistic yeah but I was just criticizing at the beginning of
was just criticizing at the beginning of
was just criticizing at the beginning of the talk
the talk
the talk my third research goal was actually how
my third research goal was actually how
my third research goal was actually how can we determine a standardized way to
can we determine a standardized way to
can we determine a standardized way to quantitatively assess the performance of
quantitatively assess the performance of
quantitatively assess the performance of generative models for this particular
generative models for this particular
generative models for this particular task and one which I recommend for all
task and one which I recommend for all
task and one which I recommend for all of automatic composition is to do a
of automatic composition is to do a
of automatic composition is to do a subjective listening experiment and so
subjective listening experiment and so
subjective listening experiment and so what we did is we built
what we did is we built
what we did is we built václav comm and it looks like this it’s
václav comm and it looks like this it’s
václav comm and it looks like this it’s got a splash page and it’s kind of
got a splash page and it’s kind of
got a splash page and it’s kind of trying to go viral it’s asking can you
trying to go viral it’s asking can you
trying to go viral it’s asking can you tell the difference between Bach and a
tell the difference between Bach and a
tell the difference between Bach and a computer they used to say man versus
computer they used to say man versus
computer they used to say man versus machine but but the interface is simple
machine but but the interface is simple
machine but but the interface is simple you’re given two choices one of them is
you’re given two choices one of them is
you’re given two choices one of them is Bach one of them is Bach bot and you’re
Bach one of them is Bach bot and you’re
Bach one of them is Bach bot and you’re asked to distinguish which one was the
asked to distinguish which one was the
asked to distinguish which one was the actual Bach we put this up out on the
actual Bach we put this up out on the
actual Bach we put this up out on the Internet
Internet
Internet I’ve got around nineteen hundred
I’ve got around nineteen hundred
I’ve got around nineteen hundred participants from all around the world
participants tended to be within the
participants tended to be within the eighteen to forty five age group the
eighteen to forty five age group the
eighteen to forty five age group the district we got a surprisingly large
district we got a surprisingly large
district we got a surprisingly large number of expert users who decided to
number of expert users who decided to
number of expert users who decided to contribute we defined expert as a
contribute we defined expert as a
contribute we defined expert as a researcher someone who is published or a
researcher someone who is published or a
researcher someone who is published or a teacher someone with professional
teacher someone with professional
teacher someone with professional accreditation as a music teacher
accreditation as a music teacher
accreditation as a music teacher advanced as someone who has who have
advanced as someone who has who have
advanced as someone who has who have studied in a degree program for music
studied in a degree program for music
studied in a degree program for music and intermediate someone who plays an
and intermediate someone who plays an
and intermediate someone who plays an instrument and here’s how they did so
instrument and here’s how they did so
instrument and here’s how they did so I’ve coded these like I’ve coded these
I’ve coded these like I’ve coded these
I’ve coded these like I’ve coded these with SAT B to represent the part that
with SAT B to represent the part that
with SAT B to represent the part that was asked to be harmonized so this is
was asked to be harmonized so this is
was asked to be harmonized so this is given the alto tenor bass harmonized
given the alto tenor bass harmonized
given the alto tenor bass harmonized with soprano this year was given just
with soprano this year was given just
with soprano this year was given just the soprano wood bass harmonized the
the soprano wood bass harmonized the
the soprano wood bass harmonized the middle – and this is composed everything
middle – and this is composed everything
middle – and this is composed everything I’m going to give you nothing this is
I’m going to give you nothing this is
I’m going to give you nothing this is the result that I’ve been coding this
the result that I’ve been coding this
the result that I’ve been coding this entire talk only participants are only
entire talk only participants are only
entire talk only participants are only able to distinguish Bach from Bach
able to distinguish Bach from Bach
able to distinguish Bach from Bach bought 7% better than random chance but
bought 7% better than random chance but
bought 7% better than random chance but there’s some other interesting findings
there’s some other interesting findings
there’s some other interesting findings in here
in here
in here well I guess this isn’t too surprising
well I guess this isn’t too surprising
well I guess this isn’t too surprising if you delete the soprano line then then
if you delete the soprano line then then
if you delete the soprano line then then Bach bot is off to create a convincing
Bach bot is off to create a convincing
Bach bot is off to create a convincing melody and it doesn’t do too well
melody and it doesn’t do too well
melody and it doesn’t do too well whereas if you delete the bass line
whereas if you delete the bass line
whereas if you delete the bass line Bach lots of a lot better now I think
Bach lots of a lot better now I think
Bach lots of a lot better now I think this is actually a consequence of the
this is actually a consequence of the
this is actually a consequence of the way I chose to deal with polyphony in
way I chose to deal with polyphony in
way I chose to deal with polyphony in the sense that I serialized the music
the sense that I serialized the music
the sense that I serialized the music from soprano alto tenor bass and so by
from soprano alto tenor bass and so by
from soprano alto tenor bass and so by the time Bach Bach got to figuring out
the time Bach Bach got to figuring out
the time Bach Bach got to figuring out what the bass note might be it already
what the bass note might be it already
what the bass note might be it already seen the soprano alto and tenor note
seen the soprano alto and tenor note
seen the soprano alto and tenor note within that time instant and so it
within that time instant and so it
within that time instant and so it already had a very strong harmonic
already had a very strong harmonic
already had a very strong harmonic context about what note might sound good
context about what note might sound good
context about what note might sound good whereas if I whereas when I’ve got the
whereas if I whereas when I’ve got the
whereas if I whereas when I’ve got the soprano note Bach watt has no idea what
soprano note Bach watt has no idea what
soprano note Bach watt has no idea what the alto tenor bass note might be and so
the alto tenor bass note might be and so
the alto tenor bass note might be and so just going to make a random guess that
just going to make a random guess that
just going to make a random guess that could be totally out of place to
could be totally out of place to
could be totally out of place to validate this hypothesis which is a work
validate this hypothesis which is a work
validate this hypothesis which is a work left for the future you could serialize
left for the future you could serialize
left for the future you could serialize in a different order such as bass tenor
in a different order such as bass tenor
in a different order such as bass tenor Alto soprano you could run this
Alto soprano you could run this
Alto soprano you could run this experiment again and you can see and you
experiment again and you can see and you
experiment again and you can see and you would expect to see it go down like this
would expect to see it go down like this
would expect to see it go down like this if the hypothesis is true and
if the hypothesis is true and
if the hypothesis is true and differently if not here I’ve taken the
differently if not here I’ve taken the
differently if not here I’ve taken the exact same plot from the previous plot
exact same plot from the previous plot
exact same plot from the previous plot except I’ve now broken it down by music
except I’ve now broken it down by music
except I’ve now broken it down by music experience unsurprisingly
experience unsurprisingly
experience unsurprisingly you kind of see this curve where people
you kind of see this curve where people
you kind of see this curve where people are doing or doing better as they get
are doing or doing better as they get
are doing or doing better as they get more experienced so the novices are like
more experienced so the novices are like
more experienced so the novices are like almost only three percent better where
almost only three percent better where
almost only three percent better where the experts are sixteen percent better
the experts are sixteen percent better
the experts are sixteen percent better they probably know Bach they’ve got it
they probably know Bach they’ve got it
they probably know Bach they’ve got it memorized so they can tell the
memorized so they can tell the
memorized so they can tell the difference but the interesting one is
difference but the interesting one is
difference but the interesting one is here the experts do significantly worse
here the experts do significantly worse
here the experts do significantly worse than random chance when getting when
than random chance when getting when
than random chance when getting when comparing Bach versus Bach bought bass
comparing Bach versus Bach bought bass
comparing Bach versus Bach bought bass harmonizations I actually don’t have a
harmonizations I actually don’t have a
harmonizations I actually don’t have a good reason why but it’s surprising to
good reason why but it’s surprising to
good reason why but it’s surprising to me it seems that the experts think block
me it seems that the experts think block
me it seems that the experts think block bot is more convincing than actual Bach
bot is more convincing than actual Bach
bot is more convincing than actual Bach so in conclusion I’ve presented a deep
so in conclusion I’ve presented a deep
so in conclusion I’ve presented a deep long short term memory generative model
long short term memory generative model
long short term memory generative model for composing completing and generating
for composing completing and generating
for composing completing and generating polyphonic music and this model isn’t
polyphonic music and this model isn’t
polyphonic music and this model isn’t just like research that I’m talking
just like research that I’m talking
just like research that I’m talking about that no one ever gets to use it’s
about that no one ever gets to use it’s
about that no one ever gets to use it’s actually open source it’s on my github
actually open source it’s on my github
actually open source it’s on my github and moreover Google’s Google brains
and moreover Google’s Google brains
and moreover Google’s Google brains magenta project has actually integrated
magenta project has actually integrated
magenta project has actually integrated it already into Google magenta so if you
it already into Google magenta so if you
it already into Google magenta so if you use the
use the
use the polyphonic recurrent neural network
polyphonic recurrent neural network
polyphonic recurrent neural network model at magenta and the tensor flow
model at magenta and the tensor flow
model at magenta and the tensor flow projects you’ll be using the bok-bok
projects you’ll be using the bok-bok
projects you’ll be using the bok-bok model the model appears to learn music
model the model appears to learn music
model the model appears to learn music theory without any prior knowledge we
theory without any prior knowledge we
theory without any prior knowledge we didn’t tell it this is a chord this is
didn’t tell it this is a chord this is
didn’t tell it this is a chord this is the cadence this is a tonic it just
the cadence this is a tonic it just
the cadence this is a tonic it just decided to figure that out on its own in
decided to figure that out on its own in
decided to figure that out on its own in order to optimize performance on an
order to optimize performance on an
order to optimize performance on an automatic composition task to me this
automatic composition task to me this
automatic composition task to me this suggests that music theory with all of
suggests that music theory with all of
suggests that music theory with all of its rules and all of its formalisms
its rules and all of its formalisms
its rules and all of its formalisms actually is useful for for comp
actually is useful for for comp
actually is useful for for comp composing in fact it’s so useful that a
composing in fact it’s so useful that a
composing in fact it’s so useful that a machine trained to optimize compose
machine trained to optimize compose
machine trained to optimize compose composition decided to specialize on
composition decided to specialize on
composition decided to specialize on these concepts finally we conducted the
these concepts finally we conducted the
these concepts finally we conducted the largest musical Turing test to date with
largest musical Turing test to date with
largest musical Turing test to date with 1,700 participants only 7% of which
1,700 participants only 7% of which
1,700 participants only 7% of which performed better than random chance
obligatory note to my employer we do
obligatory note to my employer we do open slitter we do freelance outsourcing
open slitter we do freelance outsourcing
open slitter we do freelance outsourcing if you need a development team let me
if you need a development team let me
if you need a development team let me know other than that thank you so much
know other than that thank you so much
know other than that thank you so much for your attention it was a pleasure
for your attention it was a pleasure
for your attention it was a pleasure speaking to you all
speaking to you all
speaking to you all [Applause]
Be First to Comment