fbpx
Skip to main content

Designing AI VSTs for Generative Audio Workstations

DAW plugins have been approaching a saturation point over the past decade. Reskinned VSTs seem to outnumber innovative music software by a margin of ten to one. How many new delay and reverb effect plugins do we need? Generative AI and machine learning models are bringing a wave of novelty into the music software ecosystem. But there’s a problem; most of these applications lack the plugin UI that we’ve grown accustomed to. 

These new models are built, summarized in papers on Arxiv, promoted on Twitter, and circulated through a small community of AI enthusiasts. Only a few AI models have been productized and marketed to musicians. 

Meanwhile, instant-song generators like Boomy and Soundraw are dominating public perceptions of AI music. These browser apps have been backed by venture capital and go after the royalty-free music market. Their marketing appeals to content creators who would normally shop for uaiod at Artlist, Epidemic, and SoundStripe. These browser apps aren’t designed for digital audio workflows. 

The majority of AI music models are housed in open source Github repositories. Loading a model on your personal computer can be time and resource intensive, so they tend to be run on cloud services like Hugging Face or Google Colab instead. Some programmers are generous enough to share access to one of these spaces. 

To overcome this problem, companies will need to start hiring designers and programmers who can make AI models more accessible. 

In this article, I’ll share a few exclusive interviews that we held with AI VST developers at Neutone and Samplab. I was also fortunate to speak with Voger, a prolific plugin design company, to get their take on emerging trends in this space. For reader who want to learn more about AI models and innovative plugins will find a list at the end of this piece.

Early efforts to build AI music devices in Max for Live

Ableton Max For Live is currently one of the few plugin formats that easily loads AI models directly in a DAW. The device shown below, shared by Kev at The Collabage Patch, leverages MusicGen‘s buffer feature to generate AI music in Ableton.  

This next Max For Live device is called Text2Synth and was programmed by an acquaintance who I met through Twitter, named Jake (@Naughtttt). He created an AI text prompt system that generates wavetable synths based on descriptions of the sound you’re looking for. 

Generative audio workstations: Designing UX/UI for AI models

The idea of a generative audio workstation (GAW) was popularized by Samim Winiger, CEO at generative AI music company Okio. His team is actively building a suite of generative music software and we hope see more from them in the coming year. 

A handful of AI DAW plugins are already available commercially. AudioCipher published a deep-dive on generative audio workstations last month that explored some of the most popular AI tools in depth. Instead of regurgitating that list, I’ve interviewed software developers and a design team to hear what they have to say about this niche. To kick things off, let’s have a look at Samplab.

Samplab 2: Creating and marketing an AI Audio-to-MIDI VST

Samplab is an AI-powered audio-to-MIDI plugin that offers stem separation, MIDI transcription, and chord recognition. Once the audio is transcribed, users can adjust notes on a MIDI piano roll to change the individual pitch values of the raw audio. Their founder Gian-Marco Hutter has become a close personal friend over the past year and I was excited to learn more about his company’s backstory.

1. Can you tell us about your background in machine learning and how Samplab got started?

We are two friends who met during our studies in electrical engineering and focused on machine learning while pursuing our masters degree. Thanks to co-founder Manuel Fritsche‘s research in the field, we were able to receive support from the Swiss government through a grant aimed at helping academic projects to become viable businesses. Through this program we could do research on our technology while being employed at the university.

2. How far along were you with development when you decided to turn Samplab into a commercial product?

We explored various applications of AI in music production to find out where we could provide the most value. At first, we launched a simple VST plugin and gave it out for free to get some people to test it. To our surprise, it was picked up by many blogs and spread faster than we anticipated. From that point on, we knew that we were onto something and built more advanced features on top of the free tier.

3. What were some of the biggest design challenges you faced while building Samplab 2?

It’s hard to pinpoint a single challenge, as we had to overcome so many. Our AI model can’t be run locally on a computer in a reasonable time, so we decided to provide an online service and run the frontend in a VST plugin or a standalone desktop app. Making the backend run efficiently and keeping the user interface clean and simple were all quite challenging tasks.

4. What Samplab features are you most proud of?

When working with polyphonic material, our audio-to-midi is at the cutting edge and it still keeps improving. We’re proud of our ability to change single notes, even in complex audio, and completely rearrange what’s played in samples. We’re also super happy every time we get an email from a user giving us positive feedback.

Check out Samplab’s website here

Neutone: A plugin hub for 3rd party AI music models

Next up is Neutone’s CTO, Andrew Fyfe. We reached out to learn more about their company’s mission and design philosophy.

The Neutone plugin acts as a hub for in-house and third-party AI models in the music and audio space. Typically these services would be hosted in a web browser or run locally on an advanced user’s machine. To create a hub like this, Neutone’s design team needed to work closely with each model and expose its core features, to make sure they run properly within any DAW.

1. Neutone has taken a unique approach to AI music software, acting as a hub for multiple AI models. What were some of the key considerations when designing the “main view” that displays the list of models?

When considering the model browser design, we wanted to make it easy for users to stay up to date with newly released AI models. Accessibility to cutting-edge AI tech for artists is one of the major pillars of Neutone so we focused on making the workflow simple and intuitive! One of the biggest advantages of Neutone is that the browser hooks into our online model repository so we can keep it up to date with the latest models we release.

Users download modules straight into the plugin, sort of like Output’s Arcade but for AI models instead of sample libraries! We also wanted to make sure that we credit the AI model creators appropriately, so this information is upfront and visible when navigating the browser. Additional search utilities and tags allow users to easily navigate the browser panel to find the model they are looking for.

2. When users make their selection, they see different interfaces depending on the core functionality of that model. How does your team arrive at the designs for each screen? For example, do you have in-house UX/UI designers that determine the most important features to surface and then design around that? Or do the model’s creators tend to serve up wireframes to you to expand upon?

We worked closely with our UI/UX designer and frontend team to put together the most generalized, intuitive, and slickest looking interface possible for our use-case. We wanted to create an interface that could adapt to various kinds of neural audio models. It needed to be able to communicate these changes to the user but keep the user interaction consistent. The knobs that we expose update their labels based on the control parameters of the equipped model.

We use our Neutone SDK to enforce some requirements on model developers for compatibility with our plugin. Artists and researchers can leverage our Python SDK to wrap and export their neural audio models to load into the plugin at runtime. We also warmly accept submissions from our community so that we can host them on our online repo for all to access!

3. Were there any special considerations when designing the color pallet and aesthetic of the app? I noticed that each model interface feels coherent with the others, so I’m wondering if you have brand guidelines in place to ensure the app maintains a unified feel.

On Neutone’s conception we imagined it not only being a product but an impactful brand that held its own amongst the other audio software brands out there. We are pioneering new kinds of technology and making neural audio more accessible to creators, so we wanted to marry this bold mission with a bold brand and product identity. We worked closely with our graphic designer to achieve our current look and feel and established brand guidelines to maintain visual consistency across our website and product range. The Neutone platform is only the beginning, and we have many more tools for artists coming soon!

Check out the Neutone website here

Voger & UI Mother: 15 years of music plugin interface design

To round things out, I spoke with Nataliia Hera of Voger Design. She’s an expert in UX/UI design for DAW plugins. Her interfaces have been used by major software products including Kontakt, Propellerhead’s Reason, and Antares. So of course we had to get her take on the future of design for AI VSTs.

Nataliia is also the founder of UI Mother, a music construction kit marketplace for plugin developers who want to assemble high quality interfaces, without paying the usual premium for custom designs. 

1. I’m excited to understand your design philosophy. But first, can you share some of the companies you’ve built plugin interfaces for?

We’ve worked with many small and big companies: Korg, Slate Digital, Sonimus, East West Sounds, GPU Audio and more. Propellerhead Reason and Blackstar are some of the brightest examples. 

2. I read that you started the company with your husband more than ten years ago. Can you tell us about your design background personally? What inspired you two to begin creating interfaces and music business together?

We’re self-taught in design, starting from scratch 15 years ago with no startup funds or educational resources, especially in audio plugin design, which was relatively new. We pieced together knowledge from various sources, learning by studying and analyzing designs, not limited to audio.

Our journey began when Vitaly tried using the REAPER DAW and saw the chance to improve its graphics, which weren’t appealing compared to competitors. Creating themes for REAPER garnered positive user feedback and prompted Vitaly to explore professional audio design, given the industry’s demand.

As for me, I lacked a design background initially but learned over the years, collaborating with my husband, an art director. We revamped REAPER themes successfully, alongside with plugins and Kontakt libraries GUI leading to a growing portfolio. About six months later, we established our design agency, filling a need for quality audio program design.

Today, I’ve acquired skills in graphic design, branding, product creation processes, and graphics design, thanks to our extensive library of knowledge and years of practice. 

3. What does your design process look like when starting with a new client? For example, do they typically give you a quick sketch to guide the wireframe? I noticed that your portfolio included wireframes and polished designs. How do you work together to deliver a final product that matches the aesthetic they’re looking for?

It is interesting, but from the outside, the client sees the process as working with a design agency. I compared the processes while talking to the agency owners I know and when I ordered a design service. The client gives data as they are. As usual, it can be some scratch, the list of controls, and sometimes, it may be an example of what the client wants to achieve in style.

As usual, the process may look like this:

  • We provide research on the features and how to design them in layout or style.
  • Then, we create a mood board and interactive layout.
  • Then, it’s time for the style creation.
  • At the delivery stage, we prepare files for implementation into code, animation, etc.
  • We design everything in an art or animated clip for the site or a video.

All the related things I do and implement help us to improve the result. For example, we implemented an inner standard for each level to decrease the risks when the client gets the result he didn’t expect.

The same things are true about the KPI and all other business instruments. All of them are focused on the client to get all he needs in time. Now I say “he needs,” but there is a difference between “he needs” and “he wants”. It is another story – typical client’s creative process, you know. We even have some kind of client scale where we foresee the risks of creating hard, creative things that need time, iteration, and sorting out variants in the conditions of often conflicting or unchanged data in the technical task. It is normal! 

4. Voger designs more than just flat user interfaces. Your portfolio includes some beautiful 3D-modeled interfaces. Are these used to promote the applications on websites and advertisements? What software do you use?

3D technology is very important in creating the most expensive and attractive designs for an audio sphere. This applies to a promo or a presentation and primarily to the UI itself. The biggest part of the most attractive photo-realistic designs of audio plugins you have seen for the last 20 years were either partially or completely created with the help of 3D technologies. Because only 3D technologies can give us believable shadows, lights and materials for objects.

Plausibility and photo-realism are essential for UI audio plugins. Firstly, people used to see and use familiar and legendary hardware devices (synthesizers, drum machines, effects and others). All of these devices have recognizable, unique design styles and ways of use (for example, familiar to all: Juno 106, TR-808 or LA-2A).

We all live in the real world, so even ordinary flat UI tries to repeat the real world with the help of shadows, imitation of real light and real materials like wood, plastic or metal (by the way, that is the reason Flat UI in iOS7 failed, but this is material for another interview). Actually, there is no difference in the kinds of programs. Any modern and adequate 3D editor can create a cool audio design. But we have been using only Blender 3D for these years. It is free, stable, doesn’t cause problems, and we can always rely on it (which can not be said, for example, about Adobe and similar companies). 

5. If a client wants to build a physical synth, are you able to create 3D models that they can hand off to a manufacturer?

Theoretically, yes, why not? But, practically, it all depends on a project, and we cannot influence many co-factors as the clients have not given us such an opportunity yet. I am talking about the engineering part with inner components like types of faders, potentiometers, transformers, chip location etc. We understand clients in such situations, we are not engineers to design schemes, but an exterior design made of 5 types of knobs and three types of faders available on eBay cannot be claimed as extremely complex. Still, we have experience creating a couple of such hardware devices. Unfortunately, we cannot disclose them because of NDA, as it is still under production, but one will appear very soon, and you will definitely hear about it.

6. There’s recently been a shift toward artificial intelligence in generative music software. These AI models typically run on services like Hugging Face, Google Colab, or on a skilled user’s personal computer. There are some amazing services out there that have no interface at all. I think these engineers don’t know where to go to turn their software into usable product interfaces.

Firstly, we’re a long way from achieving true AI (maybe 20 or even 40 years away). Right now, our “AI” isn’t quite the cool, creative intelligence we imagine. It’s more like trying to spread butter with a flint knife. If AI-generated designs flood the market, they’ll all look alike since they’re based on the same internet data. Soon, people will want something new, and we’ll start the cycle again. The idea of becoming AI’s foundation is intriguing. We’d love to see what the VOGER design, averaged over time by a neural model, looks like. Will it be light or dark? My art director is grinning; he knows something! 

7. I noticed that you have a second company called UI mother, that sells music interface construction kits. Can you talk about how the service differs from Voger? What is the vision for this company?

All is simple. Here, we are talking about a more democratic and budget-friendly model of getting a cool design for audio software than VOGER. Let us imagine the developers who want to try independent design creation but don’t have enough money to pay a professional team of designers. Or they risk hiring unknown freelancers who can let them down in an unexpected moment. These developers can get professional and high-quality graphics in a form of a UI kit optimized for audio programs, including many nuances of this sphere, like frame-by-frame animation, stripes, different screen sizes, etc. These clients are often independent developers or experimentalists. Sometimes, it can be small companies that need to test an idea or to create some simple, even free product.

Check out Voger Design to see more of their work and UI Mother to explore their VST UI construction kits. 

What kind of AI models would benefit from a plugin interface?

To wrap up this article, let’s have a look at some of the most interesting AI models in circulation and try to identify programs that don’t have an interface yet. I’ve organized them by category and as you’ll see, there’s a broad spread of options for the new plugin market:

  • Text prompt systems: We’ve seen several text-to-music and text-to-audio models emerge this year. The big ones are MusicGen, MusicLM, Stable Audio, Chirp, and Jen-1 but none of them offer a DAW plugin.
    • AudioCipher launched a text-to-MIDI plugin in 2020 and is currently on version 3. The algorithm generates melodies and chord progressions, with rhythm automation and layers of randomization. It’s not running on an AI model currently but it will in the future, as the current generative text-to-MIDI models mature. Our team is partnered with a developer who is solving this. In the mean time, version 4.0 is due for release before end of 2023 and existing customers get free lifetime version upgrades.
    • WavTool’s GPT-4 powered AI DAW operates in a web browser but doesn’t offer a desktop plugin yet.
    • We previously published a MIDI.ORG interview with Microsoft’s Pete Brown regarding the Muzic initiative at Microsoft Research. Their MuseCoCo text-to-MIDI model does not have a user interface.
    • The music editing model InstructME supports text prompts requesting individual instrument layers to be added, removed, extracted and replaced.
  • Audio-to-MIDI: As we already pointed out, Samplab is currently the most prominent audio-to-MIDI provider and the only VST on the market. The Spotify AI service Basic Pitch also offers this functionality, but only in a web browser.
  • AI Melody Generators: There are already several VSTs solving melody generation, usually through randomization algorithms. However, we were able to identify over a dozen AI melody generation models in the public domain.
    • In August 2023, a melody generator called Tunesformer was announced.
    • ParkR specializes in jazz solo generation and has been maintained through present day.
    • Some models, like the barely-pronounceable H-EC2-VAE, take in chord progressions and generate melodies to fit them.
  • AI Chord Generators: Chord progression generators and polyphonic composing programs are also very popular points of focus among AI model developers.
    • A diffusion model called Polyffusion, announced in July 2023, generates single track polyphonic MIDI music.
    • A second model called LHVAE also came out in July and builds chords around existing melodies.
    • SongDriver‘s model builds entire instrumental arrangements around a melody.
  • AI bass lines: There’s a fine line between writing melodies and bass lines, but they are arguably different enough to warrant its own section. SonyCSL created the BassNet model to address this niche.
  • Compose melodies for lyrics: At the moment, Chirp is the best lyric-to-song generator on the market, but other models like ROC have attempted to solve the same problem.
  • Parameter inference for synths: Wondering which knobs to twist in order to get that trademark sound from your favorite artist? The AI model Syntheon identifies wavetable parameters of sampled audio to save audio engineers time.
  • Music information retrieval: Machine learning has long been used to analyze tracks and pull out important details like tempo, beats, key signature, chords, and more.
  • AI Voices: Text to speech has been popular for a long time. Music software companies like Yamaha have started building AI voice generators for singing and rapping. The neural vocoder model BigVSAN uses “slicing adversarial networks” to improve the quality of these AI voices.
  • AI Drumming: Songwriters who need a percussion track but struggle with writing drum parts could benefit from automation.
    • Generative rhythmic tools like VAEDER by Okio’s product manager Julian Lenz and supervised by Behzad Haki.
    • The JukeDrummer model was designed to detect beats in an audio file and generate drums around it.
    • Nutall’s model focused on creating MIDI drum pattern specifically.
    • Drum Loop AI created a sequencer interface that runs Google Magenta on its backend and exports MIDI or audio files
    • For plug user interface inspiration, check out our article on Drum VSTs.
  • Stem Separation: There are several popular stem separators on the market already, like VocalRemover’s Splitter AI. In the future, it would be helpful for plugins to offer this service directly in the audio workstation.
  • Timbre and style transfer: An ex-Google AI audio programmer, Hanoi Hantrakul, was hired by TikTok to create a high resolution instrument-to-instrument timbre transfer plugin called MAWF. The plugin is available in beta and currently transfers to four instruments. Neutone supports similar features including Google’s DDSP model. A second model called Groove2Groove attempts style transfer across entire tracks.
  • Audio for Video: This article has focused on DAWs, but AI music models could also be designed to plug into video editors.
    • The award-winning sound design DAW Audio Design Desk recently launched a generative AI company called MAKR with a text-to-music service called SoundGen that will load in their video editor and integrate with other video editors like Final Cut Pro. Ultimately, this could lead to a revolution in AI film scoring.
    • A vision-to-audio model called V2A Mapper turns videos into music, analyzing the on-screen events and generating sound effects with a popular open source model called AudioLDM.
    • AI music video software like Neural Frames uses AI stem separation and transients to modulate generative imagery.
  • Emotion-to-Audio: AI-assisted game design researcher Antonio Liapis published an Arxiv paper earlier this year detailing a system that models emotions using valence, energy and tension. His system focuses on horror soundscapes for games. Berkeley researchers went so far as to record audio directly from the brain and at AudioCipher we speculated how this could be used to record dream songs.
  • Improvised collaboration: One of Google Magenta’s earliest efforts was called AI Duet. This concept has existed in the public domain for decades, like the computer-cello duet scene in the movie Electric Dreams. We’ve penned a full article outlining scenarios and existing software that could provide an AI bandmate experience.

This list covers the majority of AI music model categories that we’ve encountered. I’m sure there are plenty other ones out there and there may even be entire categories that I missed. The field is evolving on a monthly basis, which is part of the reason we find this field so exciting. 

About the author: Ezra Sandzer-Bell is the founder of AudioCipher Technologies and serves as a software marketing consultant for generative music companies. 

Microsoft Adds MIDI 2.0, Researches AI Text-to-MIDI in 2023

The MIDI Association has enjoyed an ongoing partnership with Microsoft, collaborating to ensure that MIDI software and hardware play nicely with the Windows operating system. All of the major operating systems companies are represented equally in the MIDI Association, and participate in standards development, best practices, and more to help ensure the user experience is great for everyone.

As an AI music generator enthusiast, I’ve taken a keen interest in Microsoft Research (MSR) and their machine learning music branch, where experiments about music understanding and generation have been ongoing.

It’s important to note that this Microsoft Research team is based in Asia and enjoys the freedom to experiment without being bound to the product roadmaps of other divisions of Microsoft. That’s something unique to MSR, and gives them incredible flexibility to try almost anything. This means that their MIDI generation experiments are not necessarily an indication of Microsoft’s intention to compete in that space commercially.

That being said, Microsoft has integrated work from their research team in the past, adding derived features to Office, Windows, and more, so it’s not out of the question that these AI MIDI generation efforts might some day find their way into a Windows application, or they may simply remain a fun and interesting diversion for others to experiment with and learn from.

The Microsoft AI Music research team, operating under the name Muzic, started publishing papers in 2020 and have shared over fourteen projects since then. You can find their Github repository here.

The majority of Muzic’s machine learning efforts have been based on understanding and generating MIDI music, setting them apart from text-to-music audio generation services like Google’s MusicLM, Meta’s MusicGen, and OpenAI’s Jukebox.

On May 31st, Muzic published a research paper on their first ever text-to-midi application, MuseCoco. Trained on a reported 947,659 Standard MIDI files (a file format which includes MIDI performance information) across six open source datasets, developers found that it significantly outperformed the music generation capabilities of GPT-4 (source).

It makes sense that MuseCoco would outperform GPT-4, having trained specifically on musical attributes in a large MIDI training dataset. Details of the GPT-4 prompt techniques were included on figure 4 of the MuseCoco article, shown below. The developers requested output in ABC notation, a shorthand form of musical notation for computers.

Text to MIDI prompting with GPT-4

I have published my own experiments with GPT-4 music generation, including code snippets that produce MIDI compositions and will save the MIDI files locally using JS Node with the MidiWriter library. I also shared some thoughts about AutoGPT music generation, to explore how AI agents might self-correct and expand upon the short duration of GPT-4 MIDI output.

Readers who don’t have experience with programming can still explore MIDI generation with GPT-4 through a browser DAW called WavTool. The application includes a chatbot who understands basic instructions about MIDI and can translate text commands into MIDI data within the DAW. I speak regularly with their founder Sam Watkinson, and within the next months we anticipate some big improvements.

Unlike WavTool, there is currently no user interface for MuseCoco. As is common with research projects, users clone the repository locally and then use bash commands in the terminal to generate MIDI data. This can be done either on a dedicated Linux install, or on Windows through the Windows Subsystem for Linux (WSL). There are no publicly available videos of the service in action and no repository of MIDI output to review.

You can explore a non-technical summary of the full collection of Muzic research papers to learn more about their efforts to train machine learning models on MIDI data.

Although non-musicians often associate MIDI with .mid files, MIDI is much larger than just the Standard MIDI File format. It was originally designed as a way to communicate between two synthesizers from different manufacturers, with no computer involved. Musicians tend to use MIDI extensively for controlling and synchronizing everything from synthesizers, sequencers, lighting, and even drones. It is one of the few standards which has stood the test of time.

Today, there are different toolkits and APIs, USB, Bluetooth, and Networking transports, and the new MIDI 2.0 standard which expands upon what MIDI 1.0 has evolved to do since its introduction in 1983.

MIDI 2.0 updates for Windows in 2023

While conducting research for this article, I discovered the Windows music dev blog where it just so happens that the Chair of the Executive Board of the MIDI Association, Pete Brown, shares ongoing updates about Microsoft’s MIDI and music efforts. He is a Principal Software Engineer in Windows at Microsoft and is also the lead of the MIDI 2.0-focused Windows MIDI Services project. 

I reached out to Pete directly and was able to glean the following insights.

Q: I understand Microsoft is working on MIDI updates for Windows. Can you share more information?

A: Thanks. Yes, we’re completely revamping the MIDI stack in Windows to support MIDI 2.0, but also add needed features to MIDI 1.0. It will ship with Windows, but we’ve taken a different approach this time, and it is all open source so other developers can watch the progress, submit pull requests, feature requests, and more. We’ve partnered with AMEI (the Japan equivalent of the MIDI Association) and AmeNote on the USB driver work. Our milestones and major features are all visible on our GitHub repo and the related GitHub project.

Q: What is exciting about MIDI 2.0?

A: There is a lot in MIDI 2.0 including new messages, profiles and properties, better discovery, etc., but let me zero in on one thing: MIDI 2.0 builds on the work many have done to extend MIDI for greater articulation over the past 40 years, extends it, and cleans it up, making it more easily used by applications, and with higher resolution and fidelity. Notes can have individual articulation and absolute pitch, control changes are no longer limited to 128 values (0-127), speed is no longer capped at the 1983 serial 31,250bps, and we’re no longer working with a stream of bytes, but instead with a packet format (the Universal MIDI Packet or UMP) that translates much better to other transports like network and BLE. It does all this while also making it easy for developers to migrate their MIDI 1.0 code, because the same MIDI 1.0 messages are still supported in the new UMP format.

At NAMM, the MIDI Association showcased a piano with the plugin software running in Logic under macOS. Musicians who came by and tried it out (the first public demonstration of MIDI 2.0, I should add) were amazed by how much finer the articulation was, and how enjoyable it was to play.

Q: When will this be out for customers?

A: At NAMM 2023, we (Microsoft) had a very early version of the USB MIDI 2.0 driver out on the show floor in the MIDI Association booth, demonstrating connectivity to MIDI 2.0 devices. We have hardware and software developers previewing bits today, with some official developer releases coming later this summer and fall. The first version of Windows MIDI Services for musicians will be out at the end of the year. That release will focus on the basics of MIDI 2.0. We’ll follow on with updates throughout 2024.

Q: What happens to all the MIDI 1.0 devices?

A: Microsoft, Apple, Linux (ALSA Project), and Google are all working together in the MIDI association to ensure that the adoption of MIDI 2.0 is as easy as possible for application and hardware developers, and musicians on our respective operating systems. Part of that is ensuring that MIDI 1.0 devices work seamlessly in this new MIDI 2.0 world.

On Windows, for the first release, class-compliant MIDI 1.0 devices will be visible to users of the new API and seamlessly integrated into that flow. After the first release is out and we’re satisfied with performance and stability, we’ll repoint the WinMM and WinRT MIDI 1.0 APIs (the APIs most apps use today) to the new service so they have access to the MIDI 2.0 devices in a MIDI 1.0 capacity, and also benefit from the multi-client features, virtual transports, and more. They won’t get MIDI 2.0 features like the additional resolution, but they will be up-leveled a bit, without breaking compatibility. When the MIDI Association members defined the MIDI 2.0 specification, we included rules for translating MIDI 2.0 protocol messages to and from MIDI 1.0 protocol messages, to ensure this works cleanly and preserves compatibility.

Over time, we’d expect new application development to use the new APIs to take advantage of all the new features in MIDI 2.0.

Q: How can I learn more?

A: Visit https://aka.ms/midirepo for the Windows MIDI Services GitHub repo, Discord link, GitHib project backlog, and more. You can also follow along on my MIDI and Music Developer blog at https://devblogs.microsoft.com/windows-music-dev/ . To learn more about MIDI 2.0, visit https://midi.org .

If you enjoyed this article and want to explore similar content on music production, check out AudioCipher’s reports on AI Bandmates, Sonic Branding, Sample managers and the latest AI Drum VSTs

Turning MIDI Melodies Into Full Songs with Meta’s MusicGen

The popularity of generative AI software has reached an all time high this year, with music lagging behind other mediums like image and text. Nevertheless, two applications dropped in May and June 2023 that marked a major improvement in the technology. It probably comes as no surprise that the companies behind these apps are Google and Facebook-Meta.

One of Google’s research teams published a paper in early January 2023, describing a generative AI music app called MusicLM. The paper detailed a product that could turn text prompts into songs. But perhaps more impressively, it could also take in a melody and incorporate that tune into its final output. Some demos in the paper featured humming and whistling, combined with written descriptions of attributes like genre and instrument, to output a song with that tune, in that style.

When Google launched their MusicLM beta app in May 2023, it included the text prompt feature but lacked the option to upload a melodic condition. This was a bit disappointing to those of us who had been eagerly awaiting the experience of turning our musical ideas into the genre of our choice.

Fortunately, just one month later, Meta has released their own music generator called MusicGen. As if responding to Google and one-upping them, Meta included the melodic audio input feature that Google omitted from their beta app.

In this article I’ll share a quick overview of how MIDI generation fits into the picture, along with tips about how to get started with your own experiments. 

Current limitations in AI MIDI generation

To date, even the most high profile AI MIDI melody generators have been underwhelming. OpenAI decommissioned their MIDI generation app MuseNet in December 2022, right after the launch of ChatGPT. Google offers a DAW plugin suite called Magenta Studio that includes MIDI generation, but it simply doesn’t deliver the quality that any of us would have hoped for.

Experimentally minded folks might have some fun using ChatGPT music prompts to generate MIDI melodies. WavTool is a browser app that supports the ability to do this within a DAW, but it takes a great deal of trial and error to create a good melody. In many cases, you could have composed something yourself in a shorter period of time. This comes down to the fact that large language models are not trained on music composition, despite having a solid grasp of music theory concepts.

AudioCipher’s text-to-MIDI generation VST is another option you may have already explored. It lets you control key signature, chord extensions, and rhythm automation. However, the plugin does not use artificial intelligence. Users encode words and phrases into the MIDI tracks as a source of creative inspiration. The algorithm draws from a classical tradition practiced by both spies and composers, called musical cryptography.

Suffice to say, each of these options has pushed the game forward, but none of them have perfected the MIDI song generation experience. Instead of waiting around for AI MIDI generators to get better, I propose using Meta’s MusicGen application in combination with an audio-to-midi converter. We’ll get into that next. 

Turning your MIDI melodies into full songs

To get started, create a MIDI melody in your DAW and export it as an audio file. It’s best to use a sine wave or a clean instrument without any effects. Once the audio file is ready, upload it to MusicGen and include a text prompt that describes the type of music you want to generate.

I’ve created a video demo (shown above) with AudioCipher’s text-to-MIDI melody generator and MusicGen. We created a short MIDI track, exported it as a wav file and then fed it into the Melody Condition container in Hugging Face. From there, we were able to use text prompts to turn the same tune into 15 different genres of music. 

To learn more, see this article on how to use MusicGen for music production, including suggestions on the best prompts to use with the app. I’ve also included an important tip for managing your Hugging Face account settings, to avoid accidentally racking up a large bill! 

Convert MusicGen audio back into MIDI

Now that you’ve seen how MusicGen works and may have even created an audio file of your own, the last step is to pass that file back through a polyphonic audio-to-midi converter like Samplab 2, Basic Pitch, or Melodyne.

A word of advice; MusicGen produces a lot of noise, so if you have noise reduction software, I recommend using that before passing it through a MIDI converter. Noise tends to be misinterpreted as tonal content, so cleaning it up will save you time later.

Here are the three best audio-to-midi converters that I’ve found:

Samplab 2 is my favorite option for audio-to-midi because it detects and separates instrument layers before transposing each one into MIDI. MusicGen tends to add drum layers to tracks even when you ask it not to. Samplab will separate those drums out, so you can isolate tonal instruments like piano, guitar and bass. The app is available as a DAW plugin and standalone app, with drag-to-midi capabilities.

Basic Pitch is a free alternative to Samplab that was built by Spotify and runs in your browser. It mashes everything together in a single piano roll, so I would only recommend using it for single-instrument audio files. If the track is too complex, Basic Pitch will omit a large part of the music, while simultaneously adding excessive rhythmic articulations due to noise and effect layers.

Melodyne 5 is a high quality application that supports single-instrument polyphonic MIDI conversion only. It won’t separate instruments into their own tracks, but it handles solo piano and guitar very well. You get what you pay for and to be blunt, Melodyne is expensive. So if you already have Melodyne, go ahead and try it out with this workflow. Otherwise Samplab is probably your best bet.

There you have it. Once you’ve converted the MusicGen audio file into MIDI, you can pull it down to your DAW and clean things up further in the piano roll. You’ll have an expanded arrangement based on the initial MIDI idea. But now you can add your own virtual instruments and sound design to tighten up the quality.

This might seem like a lot of information, but the whole process takes about 2 minutes, from creating an audio file in MusicGen to passing it through an audio-to-MIDI converter. You may need to spend more time fine tuning your text prompt to get the sound that you’re after. MIDI clean up in the DAW will also require a little work. But hey, it is what it is. 

I hope this primer has given you some food for thought and an entry point to deepening your AI music discovery process. These workflows might become obsolete in the coming year as the technology continues to improve. For now, this is one of the best methods I’ve found for developing a MIDI melody and turning it into a full song with artificial intelligence. Visit our site to find this complete guide to AI music apps in 2023.

The Evolution of MIDI Software and Hardware in 2023

The MIDI association hosted a live roundtable discussion last week, featuring 2022 MIDI Award winners Krishna Chetan (Pitch Innovations), Henrik Langer (Instrument of Things), Markus Ruh (MusiKraken), and John Dingley (Digigurdy). Amongst the winning products you’ll find the Somi-1 MIDI controller, a motion-sensor wearable that converts users’ body movements into MIDI data. MusiKraken’s MIDI controller construction kit similarly tracks your hands, face, voice and device rotations. 

The warm reception toward these mixed reality music products underscores a greater trend towards immersion, novelty, and disruption that’s persisted into 2023. 

The MIDI medium is the message

Changes to the way we create music can have an impact on the types of music we produce.

Media theorist Marshall McLuhan famously said that the medium is the message, meaning that our tools reveal something about our collective mindset toward the problems at hand. Conventional MIDI keyboard designs are inherited from the classical piano, for example, and with them comes the assumption that musicians should press buttons and keys to make music. Next-generation controllers like MusiKraken and Somi-1 reimagine the controller landscape by turning human bodies into expressive instruments.

Streaming platforms like Spotify and Apple Music offer a second example of unconscious bias that comes baked into our technology. Artists are required to release songs with a fixed duration, arranged into singles, EPs and albums. This model for music is inherited from legacy formats in the recording industry. As a result, they exclude modern formats like adaptive music and may be limiting our pace of musical innovation.

YouTube differs from other music streaming platforms with their support of continuous music in a 24/7 streaming format. Extreme AI artists Dadabots took advantage of this opportunity by publishing a number of infinite music videos, like the infinite metal example shown below. WarpSound offers adaptive AI music experiences on YouTube, empowering fans to can cast votes and impact the music during their YouTube livestream. These kinds of experiments are only possible because the medium supports them. 

Toward more immersive music making experiences 

MIDI software is often experienced through an LCD screen, but that could soon change with the rising popularity of virtual and mixed reality hardware.

Earlier this month, Spatial Labs announced their upcoming AR DAW called Light Field. It’s not available commercially but you can watch a demo of their prototype below. Like a laser keyboard, the interface is projected onto a hard surface and users can interact with UI elements to sequence beats, chop samples, and more.

Virtual reality music games and DAWs are another domain where music creation has evolved. Experiences like Virtuoso VR, LyraVR, Instrument Studio VR, SoundStage, SYNTHSPACE and Electronauts have the power to change our ideas about what a digital audio workstation should be and how we should interact with them. 

BREAKTHROUGHS IN MIDI COMPOSING SOFTWARE DURING 2023

Artificial intelligence has had a major impact on the creative arts this year. The popularity of text-to-image generators has coincided with a parallel trend in MIDI software. AudioCipher published its third version of a text-to-MIDI generator this year. The app turns words into melodies and chord progressions based on parameters like key signature, chord extensions, and rhythm automation. You can watch a demo below.

The text-to-music trend has continued to gain traction this year. Riffusion paved the way for text-to-song in December 2022, with Google’s MusicLm following suit in May 2023. Riffusion and MusicLM don’t compose in MIDI. They generate low fidelity audio clips replete with sonic artifacts but they’re nevertheless a step forward.

Most people hear AI Music and think of AI voice generators, due to the recent popularity of AI songs that imitate mainstream artists. An AI Drake song called Heart on my Sleeve reached more than 20,000,000 streams in April and May. United Music Group has made a public statement denouncing this practice.

Earlier today, rapper Ice Cube made a public statement calling AI music demonic and threatening to sue anyone who used his voice. Meanwhile, other artists like Grimes and Holly Herndon have sought to come up with models for consensual licensing of their voices.

So far, there has been very little discussion over the tens of millions of music clips used by Google to train MusicLM. As the owners of YouTube, Google has the right to train on the clips in their database. Many of these songs are protected by copyright and were uploaded by everyday users without the original artist’s consent.

This intro to Google’s AI music datasets outlines their training architecture in more detail and addresses some of the ethical concerns at play, as more companies seek to train on instrumental tracks to build their AI models. 

THE RISE OF AI DAWS IN 2023

Digital audio workflows can have a steep learning curve for beginners, but artificial intelligence may soon remove that barrier to entry.

WavTool, a browser-based AI DAW, comes equipped with a GPT-4 assistant that takes actions on any element in the workstation. Users can summon the chatbot and ask it to add audio effects, build wavetables, and even compose MIDI ideas. A full demo of the software is shown below.

The AI assistant understands prescriptive commands like “add a new MIDI instrument track with a square wave”.

Vague requests like “write a catchy melody” yield less satisfying results. In many instances, a prompt like that will generate a major scale that repeats twice. Rich descriptions like “write a syncopated melody comprised of quarter, eighth, and sixteenth notes in the key of C minor” deliver marginally better results.

The AI text-to-midi problem could eventually be solved by AI agents, a special class of text generation that breaks down an initial goal into a series of subtasks. During my own experiments with AutoGPT music, I found that the AI agent could reason its way through the necessary steps of composing, including quality-assurance checks along the way.

For an AI agent to actually be useful in this context, someone would need to develop the middleware to translate those logical steps into MIDI. WavTool is positioned to make these updates, but it would require a well-trained MIDI composition model that even the biggest tech teams at OpenAI’s MuseNet and Google’s Magenta Suite have not achieved to a satisfactory degree. 

POLYPHONIC AUDIO-TO-MIDI CONVERSION SOFTWARE

For years, Melodyne has been the gold standard for monophonic audio-to-midi transcription. In June 2022, a free Spotify AI tool called Basic Pitch went live, delivering polyphonic audio-to-MIDI within a web browser.

A second company called SampLab has since delivered their own plugin and desktop app this year, with more features than Basic Pitch. Both tools are pushing updates to their code as recently as this month, indicating that improvements to polyphonic MIDI transcription will be ongoing through 2023.

Suffice to say that MIDI has remained highly relevant in the era of artificial intelligence. With so many innovations taking place, we’re excited to see who comes out on top in this year’s MIDI Innovation Awards! 

MIDI Association Member Newzik Goes Beyond Sheet Music


Newzik is a Paris based MIDI Association member that has a unique approach to notation.  They offer a range of notation products that focus on classical orchestral scores and are used by a number of world renown orchestras and ensembles.  One of the most interesting parts of Muzik’s technology is their AI driven OMR technology. 


Newzik’s AI-driven Optical Music Recognition technology, Maestria.

Newzik now offers this bridge between two worlds: the one of paper scores, and the one of digital scores, with our newly released AI-driven Optical Music Recognition technology, Maestria.

Take a picture of your sheet music, press play, and listen to the music – enjoy a living, dynamic partner instead of dry ink on paper.

Transpose it, play it faster or slower, turn off your part and play along with the accompaniment only, share it in real time with colleagues and friends working on the same piece wherever they are, enrich it with annotations, audio and video files – enjoy a living, dynamic partner instead of dry and dusty notes on paper.

What are we doing? A powerful AI-driven engine, in fact: the most advanced existing, analyzes the image of your sheet music and transforms it with a reliability of nearly 100% into a digital format.

We open this new vibrant world of scores to you in Newzik, fully adaptive, fully interactive, fully collaborative – while fully protecting copy, performance and exploitation rights.

And not only the music is transformed, also all metadata – composer, title, tempi, dynamic indications, fingerings, bowings, breathing marks, it’s the full range of information until now petrified in black dots of ink on paper which we bring to live.

by Newzik


Sync large ensembles with multiple devices using Newzik Web 

Collaboration is at the center of Newzik’s approach.  They offer both iPAD and iPhone apps.  In particular,  many orchestras and ensembles have moved to using iPADs ar rehearsals and performances and using Newzik apps allows large groups to share and exchange markings and notational details.  There is also a desktop application. 

All of these different applications can be shared via the Newzik Web. It’s an online web platform that allows you to view your Newzik account and your digital sheet music from any compatible web browser and on any computer. 

Newzik Web is an easy way to share sheet music and work in collaboration with other musicians: the “projects” section is a collective space on the platform that allows several users to work together on a musical project.



Newzik Education, Support and Pricing

There is a dedicated website for support and a blog section with articles on how to get the most out of Newzik.  

You can get started with Newzik for free and an annual subscription is only $29. 


AudioCipher V3: The Word-to-MIDI Melody and Chord Progression Generator

MIDI Association partner AudioCipher Technologies has just published Version 3.0 of their melody and chord progression generator plugin. Type in a word or phrase and AudioCipher will automatically generate MIDI files for any virtual instrument in your DAW. AudioCipher helps you overcome creative block with the first ever text-to-MIDI VST for music producers.

Chord generator plugins have been a hallmark of the MIDI effects landscape for years. Software like Captain Chords, Scaler 2, and ChordJam are some of the most popular in the niche. Catering to composers, these apps tend to feature music theory notation concepts like scale degrees and Roman numerals. They provide simple ways to apply chord inversions, sequencing and control the BPM. This lets users modify chord voicings and edit MIDI in the plugin before dragging it to a track.

AudioCipher offers similar controls over key signature, scale selection, chord selection, rhythm control, and chord/rhythm randomization. However, by removing in-app arrangement, users get a simplified interface that’s easier to understand and takes up less visual real estate in the DAW. Continue your songwriting workflow directly in the piano roll to perform the same actions that you would in a VST.

AudioCipher retails at $29.99 rather than the $49-99 price points of its competitors. When new versions are released, existing customers receive free software upgrades forever. Three versions have been published in the past two years. 

Difficulty With Chord Progressions

Beginner musicians often have a hard time coming up with chord progressions. They lack the skills to experiment quickly on a synth or MIDI keyboard. Programming notes directly into the piano roll is a common workaround, but it’s time consuming, especially if you don’t know any music theory and are starting from scratch.

Intermediate musicians may understand theory and know how to create chords, but struggle with finding a good starting point or developing an original idea.

Common chord progressions are catchy but run the risk of sounding generic. Pounding out random chords without respect for the key signature is a recipe for disaster. Your audience wants to hear that sweet spot between familiarity and novelty.

Most popular music stays in a single key and leverages chord extensions to add color. The science of extending a chord is not too complicated, but it can take time to learn.

Advanced musicians know how to play outside the constraints of a key, using modulation to prepare different chords that delight the listener. But these advanced techniques do require knowledge and an understanding of how to break the rules. It’s also hard to teach old dogs new tricks, so while advanced musicians have a rich vocabulary, they are at risk of falling into the same musical patterns.

These are a few reasons that chord progression generators have become so popular among musicians and songwriters today. 

AudioCipher’s Chord Progression Generator

Example of AudioCipher V3 generating chords and melody in Logic Pro X

Overthinking the creative process is a sure way to get frustrated and waste time in the DAW. AudioCipher was designed to disrupt ordinary creative workflows and introduce a new way of thinking about music. The first two versions of AudioCipher generated single-note MIDI patterns from words. Discovering new melodies, counter-melodies and basslines became easier than ever.

Version 3.0 continues the app’s evolution with an option to toggle between melody and chord generator modes. AudioCipher uses your word-to-melody cipher as a constant variable, building a chord upon each of the encrypted notes. Here’s an overview of the current features and how to use them to inspire new music.

AudioCipher V3.0 Features

  • Choose from 9 scales: The 7 traditional modes, harmonic minor, and the twelve-note chromatic scale. These include Major, Minor, Dorian, Phrygian, Lydian, Mixolydian, and Locrian.
  • Choose from six chord types including Add2, Add4, Triad, Add6, 7th chords, and 9ths.
  • Select the random chord feature to cycle through chord types. The root notes will stay the same (based on your cryptogram) but the chord types will change, while sticking to the notes in your chosen scale.
  • Control your rhythm output: Whole, Half, Quarter, Eighth, Sixteenth, and all triplet subdivisions.
  • Randomize your rhythm output: Each time you drag your word to virtual instrument, the rhythm will be randomized with common and triplet subdivisions between half note and 8th note duration.
  • Combine rhythm and chord randomization together to produce an endless variety of chord progressions based on a single word or phrase of your choice. Change the scale to continue experimenting.
  • Use playback controls on the standalone app to audition your text before committing. Drag the MIDI to your software instrument to produce unlimited variation and listen back from within your DAW.
  • The default preset is in C major with a triad chord type. Use the switch at the top of the app to move between melody and chord generator modes.

How to Write Chord Progressions and Melodies with AudioCipher

Get the creative juices flowing with this popular AudioCipher V3 technique. You’ll combine the personal meaning of your words with the power of constrained randomness. Discover new song ideas rapidly and fine-tune the MIDI output in your piano roll to make the song your own.

  • Choose a root and scale in AudioCipher
  • Switch to the Chord Generator option
  • Select “Random” from the chord generator dropdown menu
  • Turn on “Randomize Rhythm” if you want something bouncy or select a steady rhythm with the slider
  • Type a word into AudioCipher that has meaning to you (try the name of something you enjoy or desire)
  • Drag 5-10 MIDI clips to your software instrument track
  • Choose a chord progression from the batch and try to resist making any edits at first

Next we’ll create a melody to accompany your chord progression.

  • Keep the same root and scale settings
  • Switch to Melody Generator mode
  • Create a new software instrument track, preferably with a lead instrument or a bass
  • Turn on “Randomize Rhythm” if it was previously turned off
  • Drag 5-10 MIDI clips onto this new software instrument track
  • Move the melodies up or down an octave to find the right pitch range to contrast your chords
  • Select the best melody from the batch

Adjust MIDI in the Piano Roll

Once you’ve found a melody and chord progression that inspires you, proceed to edit the MIDI directly in your piano roll. Quantize your chords and melody in the piano roll, if the triplets feel too syncopated for your taste. You can use sound design to achieve the instrument timbre you’re looking for. Experiment with additional effects like adding strum and arpeggio to your chords to draw even more from your progressions.

With this initial seed concept in place, you can go on to develop the rest of the song using whatever techniques you’d like. Return to AudioCipher to generate new progressions and melodies in the same key signature. Reference the circle of fifths for ideas on how to update your key signature and still sound good. Play the chords and melody on a MIDI keyboard until you have ideas for the next section on your own. Use your DAW to build on your ideas until it becomes a full song.

Technical specs

AudioCipher is a 64-bit application that can be loaded either as a standalone or VST3 / Audio Component in your DAW of choice. Ableton, Logic Pro X, FL Studio, Reaper, Pro Tools, and Garageband have been tested and confirmed to work. Installers are available for both MacOS and Windows 10, with installer tutorials available on the website’s FAQ page. 

A grassroots hub for innovative music software

Along with developing VSTs and audio sample packs, AudioCipher maintains an active blog that covers the most innovative trends in music software today. MIDI.org has published AudioCipher’s partnerships with AI music software developers like MuseTree and AI music video generator VKTRS.

AudioCipher’s recent articles dive into the cultural undercurrents of experimental music philosophy. One piece describes sci-fi author Philip K Dick’s concept of “synchronicity music”, exploring the role of musicians within simulation theory his VALIS trilogy. Another article outlines the rich backstory of Plantwave, a device that uses electrodes to turn plants into MIDI music.

The blog also advocates small, experimental software like Delay Lama, Riffusion and Text To Song, sharing tips about how to use and access each of them. Grassroots promotion of these tools brings awareness to the emerging technology and spurs those developers to continue improving their apps.

Visit the AudioCipher website to learn more. 

3 Best AI Music Generators for MIDI Creation

A new generation of AI MIDI software has emerged over the past 5 years. Google, OpenAI, and Spotify have each published a free MIDI application powered by machine learning and artificial intelligence.

The MIDI Association reported on innovations in this space previously. Google’s AI Duet, their Music Transformer, and Massive Technology’s AR Pianist all rely on MIDI to function properly. We’re beginning to see the emergence of browser and plugin applications linked to cloud services, running frameworks like PyTorch and TensorFlow.

In this article we’ll cover three important AI MIDI tools – Google Magenta Studio, OpenAI’s MuseNet, and Spotify’s Basic Pitch MIDI converter. 

Google Magenta Studio 

Google Magenta is a hub for music and artificial intelligence today. Anyone who uses a DAW and enjoys new plugins should check out the free Magenta Studio suite. It includes five applications. Here’s a quick overview of how they work:

  • Continue – Continue lets users upload a MIDI file and leverage Magenta’s music transformer to extend the music with new sounds. Keep your temperature setting close to 1.0-1.2, so that your MIDI output sounds similar to the original input but with variations.
  • Drumify – Drumify creates grooves based on the MIDI file you upload. They recommend uploading a single instrumental melody at a time, to get the best results. For example, upload a bass line and it will try to produce a drum beat that compliments it, in MIDI format.
  • Generate – Maybe the closest tool in the collection to a ‘random note generator’, Generate uses a Variational Autoencoder (MusicVAE) and has trained on millions of melodies and rhythms within its dataset.
  • Groove – This nifty tool takes a MIDI drum track and uses Magenta to modify the rhythm slightly, giving it a more human feel. So if your music was overly quantized or had been performed sloppily, Groove could be a helpful tool.
  • Interpolate This app asks you for two separate MIDI melody tracks. When you hit generate, Magenta composes a melody that bridges them together.

The Magenta team is also responsible for Tone Transfer, an application that transforms audio from one instrument to another. It’s not a MIDI tool, but you can use it in your DAW alongside Magenta Studio.

OpenAI MuseNet 

MuseTree – Free Nodal AI Music Generator


OpenAI
is a major player in the AI MIDI generator space. Their Dalle 2 web application took the world by storm this year, creating stunningly realistic artwork and photographs in any style. But what you might not know is that they’ve created two major music applications, MuseNet and Jukebox.

  • MuseNet – MuseNet is comparable to Google’s Continue, taking in MIDI files and generating new ones. But users can constrain the MIDI output to parameters like genre and artist, introducing a new layer of customization to the process.
  • MuseTree – If you’re going to experiment with MuseNet, I recommend using this open source project MuseTree instead of their demo website. It’s a better interface and you’ll be able to create better AI music workflows at scale.
  • Jukebox – Published roughly a year after MuseNet, Jukebox focuses on generating audio files based on a set of constraints like genre and artist. The output is strange, to say the least. It does kind of work, but in other ways it doesn’t. The application can also be tricky to operate, requiring a Google Colab account and some patience troubleshooting the code when it doesn’t run as expected. 

Spotify Basic Pitch (Audio-to-MIDI)

Spotify’s Basic Pitch: Free Audio-To-MIDI Converter

Spotify is the third major contender in this AI music generator space. A decade ago, in 2013, they published a mobile-friendly music creation app called Soundtrap. So they’re no stranger to music production tools. As for machine learning, there’s already a publicly available Spotify AI toolset that powers their recommendation engine. 

Basic Pitch is a free browser tool that lets you upload any song as an audio file and convert it into MIDI. Basic pitch leverages machine learning to analyze the audio and predict how it should be represented in MIDI. Prepare to do some cleanup, especially if there’s more than one instrument in the audio. 

Spotify hasn’t published a MIDI generator like MuseNet or Magenta Studio’s Continue. But in some ways Basic Pitch is even more helpful, because it generates MIDI you can use right away, for a practical purpose. Learn your favorite music quickly!

 The Future of AI MIDI Generators

The consumer applications we’ve mentioned, like Magenta Studio, MuseTree, and Basic Pitch, will give you a sense of their current capabilities and limitations. For example, Magenta Studio and MuseTree work best when they are fed special types of musical input, like arpeggios or pentatonic blues melodies. 

Product demos often focus on the best use cases, but as you push these AI MIDI generators to their limits, the output becomes less coherent. That being said, there’s a clear precedent for future innovation and the race is on, amongst these big tech companies, to compete and innovate in the space.

Private companies, like AIVA and Soundful, are also offering AI music generation for licensing. Their user-friendly interfaces are built for social media content creators that want to license music at a lower cost. Users create an account, choose a genre, generate audio, and then download the original music for their projects.

Large digital content libraries have been acquiring AI music generator startups in recent years. Apple bought up a London company called AI Music in February 2022, while ShutterStock purchased Amper Sounds in 2020. This suggests a large upcoming shift in how licensed music is created and distributed.

At the periphery of these developments, we’re beginning to see robotics teams that have successfully integrated AI music generators into singing, instrument-playing, animatronic AI music robots like Shimon and Kuka. Built by the Center for Music Technology at Georgia Tech, Shimon has performed live with jazz groups and can improvise original solos thanks to the power of artificial intelligence. 

Stay tuned for future articles, with updates on this evolving software and robotics ecosystem. 

MIDI used to complete Beethoven’s 10th symphony

MIDI Association contributor Walter Werzowa was featured on CNN today (Dec 26, 2021)

One of the best things about the MIDI Association is the great people we get to meet and associate with. After all they don’t call it an association for nothing.  This year during May Is MIDI Month, we were putting together a panel on MIDI and music therapy and Executive Board member Kate Stone introduced us to Walter Werzowa. 

So we were pleasantly surprised today when one of Walter’s latest projects was featured on Fareed Zakaria GPS show. 


HealthTunes®  

We first got interested in Walter because of Healthtunes.org. HealthTunes® is an audio streaming service designed to improve one’s physical and mental health was founded by Walter in 2016. It uses Binural Beats.

Binaural beats and isochronic tones are embedded within our music (the low humming sound some may hear), which are two different methods used for brain wave entrainment. Binaural beats work by using two slightly different frequency tones sent to each ear. Isochronic tones use a single tone with a consistent beat being turned off and on regularly. Your body automatically reacts to both binaural beats and isochronic tones with a physiological response allowing one’s brain to reach a more desired mental state by influencing brain wave activity.

by HealthTunes®


...

HealthTunes – Music for Health

HealthTunes® is a streaming audio service designed to improve your physical and mental health.


musikvergnuegen– Audio Branding 

We soon learned that Walter had done many things in his career including memorable sonic branding themes from his company- musikvergnuegen. Vergnuegen could be translated as joy or fun and is used in the German word for amusement/theme park -vergnugungspark.  

Almost everyone on the planet has heard his audio branding signatures.  The Intel Boing and T mobile 5 note theme are all brilliant examples of simple mnemonics that could easily be described as ear worms.  

By the way, the term ear worm comes from the German  öhrwurm invented over 100 years ago to describe the experience of a song stuck in the brain.

T Mobile.mp3


Beethoven’s “finally finalized” 10th Symphony 

But Walter’s latest project is perhaps his most impressive yet. He was part of a team of AI researchers and musicians that used AI to “finish” Beethoven’s Unfinished Symphony #10.  How was MIDI involved? Like most AI music projects,  the AI algorithm was trained using MIDI data of not only all of Beethoven’s completed symphonies, but all of his other works as well as works from Beethoven’s contemporaries that he would have listened to and been influenced by.  You can watch the NBC’s Molly Hunter’s interview with Walter or just listen to the results of Walter’s work below. 


Below is a link to the full Beethoven X symphony performance 


...

Beethoven X – The AI Project | MagentaMusik 360

Beethovens 10. Sinfonie am 9. Oktober ab 19 Uhr im kostenlosen Stream auf MagentaMusik 360. Das bisher unvollendete Stück von Ludwig van Beethoven wurde mithilfe einer AI (dt.: künstliche Intelligenz) nun zu Ende komponiert.


AR Pianist combines MIDI and AI to create virtual piano performances in your home

Massive Technologies releases major update to AR Pianist with new MIDI and Audio features

Massive Technologies (MT) newest AR Pianist update shows the unique power of combining MIDI Data with AI and VR technologies and is an incredibly engaging combination of new technologies. 

They gave The MIDI Association the inside scoop on their new update to AR Pianist. 

One of the major new features is the ability to import MIDI files to create virtual performances. 


We’re excited to announce that a major update of AR Pianist is to be released on May 25th. We’ve been working on this update tirelessly for the past two years.

The update brings our AI technology to users’ hands, and gives them the ability to learn any song by letting the app listen to it once through the device microphone.

Using AI, the app will listen to the audio, extract notes being played, and then show you a virtual pianist playing that song for you with step by step instructions.

The app also uses machine learning and augmented reality to project the virtual avatar onto your real piano, letting you experience the performance interactively and from every angle.

Users can also record their piano performance using the microphone (or MIDI), and then watch their performance turn into a 3D / AR virtual concert. Users can share it as a video now, and to VR / AR headsets later this year.

The update also features songs and content by “The Piano Guys”, along with a licensed Yamaha “Neo” designer piano.

by Massive Technologies


A.I. Generates 3D Virtual Concerts from Sound: 

“To train the AI, we brought professionally trained pianists to our labs in Helsinki, where they were asked to simply play the piano for hours. The AI observed their playing through special hardware and sensors, and throughout the process the pianist and we would check the AI’s results and give it feedback or corrections. We would then take that feedback and use it as the curriculum for the AI for our next session with the pianist. We repeated that process until the AI results closely matched the human playing technique and style.”

by Massive Technologies

Massive Technologies used MIDI Association Member Google’s Tensor Flow to train their AI model. 

The technology’s main potential is music education, for piano teachers to create interactive virtual lessons for remote teaching—or for virtual piano concerts, and film or games creators who want to incorporate a super-realistic pianist in their scenes.


The key to it all is MIDI 

If you look at the work being done by Google, Yamaha, Massive Technologies, The Pianos Guys and others in the AI space, MIDI is central to all of those efforts. 

Why? Because MIDI is the Musical Instrument Digital Interface so to connect with AI and Machine Learning algorithms you usually have to convert Music into MIDI.  

How Does AR Pianist work and what can you do with it? 

AR Pianist combines a number of proprietary Massive Technologies together. 

  • Multi pitch recognition

Massive Technologies’ in house ML models can estimate pitch and extract chords from audio streams, on the fly, in realtime.

This allows you to convert audio files of solo piano recordings into MIDI data that the AI engine can analyze.  Of course you can also directly import MIDI data. 

  • Object pose estimation

Their proprietary models can estimate the 3d position and orientation of real instruments from a single photograph. 

This allows you to point your mobile device’s camera at your 88 note keyboard.  The app can then map your keyboard into 3D space for use with Augmented Reality. 

  • Motion synthesis and 3D Animation Pipeline

MT developed new machine learning algorithms that can synthesize novel and kinematically accurate 3d musical performance from raw audio files, for the use in education and AR / VR. Their tools can perform advanced full body and hand inverse kinematics to fit the same 3d musical performance to different avatars.

This is the part that almost seems like magic. 

The app can take a MIDI or Audio performance (the Audio performance should be piano only), analyze it and generate musically correct avatar performances with the correct fingerings and hand positions including complex hand crossovers like those often used in classical or pop music (think the piano part from Bohemian Rhapsody).

  • Music notation rendering, in 3D

Massive Technologies has built a notation rendering engine, that can be used to display music scores in 3D and inside virtual environments, including AR / VR.

This allows you to see the notation for the performances.  Because the data is essentially MIDI like data you can slow the tempo down, set the app to wait for you to play the right note before moving forward and other practice techniques that are widely used in MIDI applications. 


A.I. Plays Rachmaninoff Playing Himself (First Person View): 

An audio piano roll recording of Rachmaninoff himself playing his famous Prelude, from 1919, reconstructed into 3d animation by Massive Technologies AI.

A virtual camera was attached to the virtual avatar’s head, where its movement is being driven by the AI, simulating eye gaze and anticipation. 


Massive Technologies is Fayez Salka, M.D Medical Doctor, Musician, Software Developer and 3D Artist and 

Anas Wattar, BCom Graduate from McGill University, Software Developer and 3D Artist.


AR Pianist is available for on the Apple App Store and Google Play store 

The app is free to download and offers in app purchases for libraries of songs. You can check out Jon Schimdt of the Piano Guys virtual demoing the AR Pianist at any Apple retail store. 


Check out the press release with details on the original app introduction


Google’s Music Transformer- State of the Art in Music AI?

Google has been doing amazing work in music AI and recently they posted demos created by their Music Transformer. The goal was to generate longer pieces of music that had more coherence because the model was using relative attention. 

We found that by using relative attention, which explicitly modulates attention based on how far apart two tokens are, the model is able to focus more on relational features. Relative self-attention also allows the model to generalize beyond the length of the training examples, which is not possible with the original Transformer model.

by  Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu and Douglas Eck.

 The following three examples were created by Music Transformer, an attention-based neural network.  We won’t even get into the question of who owns the copyright to these pieces of music because it makes our head hurt. Remember all of this comes from the neural networks being trained by MIDI files from the e-competition recorded on Yamaha Disklaviers. 


Artist Name

relatively_jazz.mp3


Artist Name

classical_favourite_sample.mp3


Artist Name

transformer_nice.mp3

To explain how this relative attention works Google created a video displaying the relative attention as “arcs showing which notes in the past are informing the future.” 

There are other possibilities for Music Transformer.   Here are two versions  of Twinkle Twinkle Little Star. 

Here we trained a Music Transformer model to map heuristically-extracted melody to performance, and then asked it to play the Twinkle Twinkle Little Star melody (with chords unspecified):

 by Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu and Douglas Eck.


Artist Name

twinkle_1.mp3


Artist Name

twinkle_3.mp3

In this next case, the AI model was given the chords to Hotel California.  You can see the core technology has tons of potential for helping musicians to be creative in the future.   Artificial Intelligence will soon be another tool in our creative palette. 


Artist Name

transformer_hotel_california.mp3

 For more technical details you can read the actual paper or go to the original blog post. 


...

Music Transformer: Generating Music with Long-Term Structure

Generating long pieces of music is a challenging problem, as music containsstructure at multiple timescales, from milisecond timings to motifs to phrases t…

Google Magenta-Making Music with MIDI and Machine Learning

In January 2018, we covered Intel’s Keynote pre-show which prominently featured Artificial Intelligence and MIDI.  

But one of the leaders in the AI and Machine Learning field is Google.  Their Magenta project has been doing a lot of research and experimentation in using machine learning for both art and music. The great thing about Google is that they share the details of their research on their website and even their code on Github.

Magenta is a research project exploring the role of machine learning in the process of creating art and music. Primarily this involves developing new deep learning and reinforcement learning algorithms for generating songs, images, drawings, and other materials

 


by magenta.tensorflow.org

In this article, we’ll review the latest Google music research projects and provide links to further information. We’ll also show how MIDI is fundamentally involved in many of these projects.  

MusicVAE is a hierarchical recurrent variational autoencoder for learning latent spaces for musical scores. It is actually not as complex as it sounds and Google does an incredible job of explaining it on their site. 

When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work.

Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but we are hoping to change that. Below we introduce MusicVAE, a machine learning model that lets us create palettes for blending and exploring musical scores.

by magenta.tensorflow.org

 


Beat Blender by Creative Lab.

 Beat Blender uses MusicVAE and lets you put 4 drum beats on 4 corners of a square and then uses machine learning and latent spaces to generate two-dimensional palettes of drum beats that are morph from one dimension to the other.   You can manually select the patterns with your mouse and even draw a path to automate the progression of the patterns from one dimension to another.  You can select the “seeds ‘ for the four corners and Beat Blender will output MIDI (using Web MIDI) so you can not only use Beat Blender with its internal sounds, but with any MIDI device you have connected to your computer.  


Latent Loops, by Google’s Pie Shop

Latent Loops uses MusicVAE to auto-generate melodies, You can then put them on a timeline to build more complex arrangements and finally move them over to their DAW of choice. It also has MIDI output using Web MIDI. 


Onsets and Frames: Dual-Objective Piano Transcription

Onsets and Frames is our new model for automatic polyphonic piano music transcription. Using this model, we can convert raw recordings of solo piano performances into MIDI.

by magenta.tensorflow.org

Although still not perfect,  Google has made significant progress in extracting MIDI data from polyphonic audio files. 

Here is the original audio input file.


Artist Name

moz331-ground.mp3

Here is the output from Google’s transcription.  


Artist Name

moz331-ours.mp3


Performance RNN – Generating Music with Expressive Timing and Dynamics

 Performance RNN recurrent neural network designed to model polyphonic music with expressive timing and dynamics.  Google feed the neural network recordings from the Yamaha e-Piano Competition dataset which contains MIDI captures of over 1400 performances by skilled pianists. The Performance RNN demo website has both MIDI input and MIDI output. 

Our performance representation is a MIDI-like stream of musical events. Specifically, we use the following set of events:

  • 128 note-on events, one for each of the 128 MIDI pitches. These events start a new note.
  • 128 note-off events, one for each of the 128 MIDI pitches. These events release a note.
  • 100 time-shift events in increments of 10 ms up to 1 second. These events move forward in time to the next note event.
  • 32 velocity events, corresponding to MIDI velocities quantized into 32 bins. These events change the velocity applied to subsequent notes.

by magenta.tensorflow.org

Here is an example of the output from the neural network.  You can listen to more anytime here


Artist Name

Neural Network Created Piano Performance.mp3


N-synth- Neural Audio Synthesis

Google trained this neural network with over 300.000 samples from commercially available  sample libraries.  

Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, NSynth uses deep neural networks to generate sounds at the level of individual samples. Learning directly from data, NSynth provides artists with intuitive control over timbre and dynamics and the ability to explore new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer.

by magenta.tensorflow.org

Google even developed a hardware interface to control N-Synth. 


Where is musical AI and Machine Learning headed?

We could well be on the edge of a revolution as big as the transition from the electronic era to the digital era that occurred in the years between 1980 and 1985 when MIDI was first born.  

In the next 3-5 years musical AI tools may well become standard parts of the modern digital studio. 

Yet somehow its seems that like the softsynth revolution of the early 2000s, MIDI will once again be at the center of the next technology revolution. 

Google Releases Song Maker with Web MIDI

Google Creative Lab, Use All Five, and Yotam Mann launched a new browser-based music sequencer called Song Maker. 

It’s a classic grid style sequencer and allows anyone to easily create simple grooves on the web.  You can even connect your MIDI keyboard or other controllers to input notes via Web MIDI. 

When you’re finished you can save your Song Maker groove, share it on Facebook and Twitter or even get the embed code to embed Song Maker on your website.  Check it out below!


.


MIDI and Music AI

Google’s Duet AI (with Web MIDI)

 Google Creative Lab recently released A.I. Duet, an interactive AI Music experiment that lets you use your laptop keyboard or a MIDI keyboard (using Chrome’s Web MIDI feature) to make music and experiment with artificial intelligence . 

Duet was built by Yotam Mann and the Magenta and Creative Lab teams at Google using Tensorflow, Tone.js, and open-source tools from the Magenta project.

The cool thing about this project is that you can not only play music with it, it’s all open source code so that if you are into coding you can get the actual code to experiment with it.  Tensorflow is  an open source software library for numerical computation using data flow graphs. 

Tone.js is a Web Audio framework for creating interactive music in the browser. The architecture of Tone.js aims to be familiar to both musicians and audio programmers looking to create web-based audio applications. On the high-level, Tone offers common DAW (digital audio workstation) features like a global transport for scheduling events and prebuilt synths and effects. For signal-processing programmers (coming from languages like Max/MSP), Tone provides a wealth of high performance, low latency building blocks and DSP modules to build your own synthesizers, effects, and complex control signals.

by Yotam Mann

Yotam also worked on another really interesting AI musical experiment called the Infinite Drum Machine,

Last year at MoogFest, Google announced their plans for Magenta. Doug Eck explained that one of Magenta’s goals is to create an open-source tool to bring together artists and coders looking to make art and music in a collaborative space. As part of the initiative, Google will provide audio and video support, tools for MIDI users and platforms that will make it easier for artists to connect with machine learning models.

The Magenta project generated it’s first song (available below) after being fed only a few notes of input.


Artist Name

Google_-_Magenta_music_sample.0.mp3


Here is a link to all of the Google AI experiments.


...

A.I. Experiments

AI Experiments is a showcase for simple experiments that let anyone play with artificial intelligence and machine learning in hands-on ways, through pictures, drawings, language, music, and more.


Sony’s Flow Machines help us be more creative

Google is not the only company that has created musical artificial intelligence experiments. Sony’s Flow Machines goal is to “research and develop Artificial Intelligence systems able to generate music autonomously or in collaboration with human artists.”  

Here is an example of Bach harmonization generated using deep learning. 

By turning music style into a computational object, Sony’s research project funded by the European Research Council (ERC) can create songs in different styles.  Here is a song generated by Flow Machines in the style of the Beatles. 

So what does all this musical artificial intelligence have to do with MIDI.  Most of these learning machines are fed MIDI as their input because MIDI is the musical instrument digital interface. For example, the Magenta artificial intelligence engine was fed 8000 MIDI files that the neural network analyzed for patterns. 

For even more information about musical artificial intelligence check out this excellent article from @hazelcills Hazel Cills on the MTV website


...

Can AI Make Musicians More Creative? – MTV

Google and Sony want to change the way artists think about artificial intelligence