Audio Based Melody Generation Using MusicVAE
Table Setting

Audio Based Melody Generation Using MusicVAE

By Aaron Basch, Chris Carson, and Lucas Bastos

One of the most frustrating situations that a music producer can run into is a creative block. Whether it just be trying to start a new project or build upon an established idea, a creative block can completely ruin your workflow and affect your inspiration for an extended period of time. This can be even worse for people who make music as their primary source of income. There are plenty of websites, programs, and plug-ins that will randomly generate melodies, but there are few, if any, that can take in an audio input to produce new, but similar, results. By creating this, we give users a more accurate result of what they are looking for than other programs. Also, by giving them MIDI files, they can easily be edited in any DAW so they are not stuck with the melody we provide. Doing this provides the producer with absolute freedom to do whatever they want with the melody. If implemented well, this could be a great tool to help music producers get inspired and get over their creative blocks.

Because our program is only as good as the input we feed it, a lot of focus was put on the audio-to-pianoroll function. The first attempt at solving this was to find a pre-existing solution and modify it to fit our specifications. After trying out a few programs from GitHub, we found one that seemed to most serve our purpose ( This program uses a Hidden Markov Model to predict note probabilities using librosa’s implementation of pYin pitch detection. While this worked well enough, there remained issues that we felt would hinder the effectiveness of the overall program; namely time and pitch. The resulting midi was unconstrained to any discernible time-grid, and pitch values were highly inaccurate. As a result, we decided to design our own audio-to-pianoroll function, which was simpler but much more effective at capturing the essence of the melody, as well as sticking to a predictable time-grid. This method essentially took the most frequent note estimated by a pertained CREPE model for each 1/8 note time division, while making the assumption that pitch estimations with a low average-confidence score are silent sections. This method makes one very important assumption about the audio input; the input start and stop times are the precise moments in which an 8 bar melody begins and ends. We felt this is an appropriate assumption since the program encourages the user to choose a specific segment, and since the melody-mixer model performs best with more “regular” melodies, this method hugely outperforms the known available methods. It should be noted that there was an additional attempt to detect the key of the song and adjust out-of-scale notes, but it was found to corrupt the many song choices users might have that do not fall into standard major/minor modes, so this was abandoned.

The melodies are passed as a JSON object to a seperate melody mixer JavaScript file. It is here that the melodies are mixed using MusicVAE and it's Interpolate function. The JSON object is parsed so that it fit's MusicVAE's format. They are then each mixed together one by one using the interpolate function. One of these interpolations is then selected randomly and is written into a MIDI file using JSMidi which is then made for the user to download.

This entire end-to-end program can then optionally be ran through a web interface with a clean and minimalistic user interface. This allows the user to use the program in a much simpler way by avoiding the command line. The web interface is ran using the Express framework and can be quickly deployed to a website or simply a localhost. The inner workings are quite simple, the initial page that a user finds themselves at requests an upload of audio tracks in .wav format to then upload into the system, where in the back end all of the magic happens with our program. In the end, the user will see their new audio track with the option to also download it.


An example of two melodies with their MIDI extracted and then the mixture between them. All three are played with an in-browser MIDI player.

Input A:

Input B:


Try Our Demo!