Skip to main content

Voice-to-MIDI: A More Intuitive Way to Interact with Music

The Problem-No Real-Time Solutions 

Three years ago, while on tour with my band, Ritual Howls, I encountered a problem. After five grueling hours of sitting in a van on our way to the next gig, I had an idea for a drum beat but no way to capture it. All my gear was surgically packed away and out of reach. How could I capture my idea and turn it into workable music? 

Typically when inspiration for new music strikes, I open a voice recorder app and record myself beatboxing a beat or mumbling a melody. Once I have access to a computer, I listen back to the recording and try to recreate it by moving MIDI notes into just the right place. While this method would help preserve my idea, it didn’t provide a real-time solution.

Being a musician and product designer, I began researching the marketplace for applications or plugins that could translate my beatboxing into usable MIDI notes. I experimented with a couple of products with audio-to-midi functionality, but nothing worked the way I envisioned. Some plugins even required special microphones to work effectively. I needed something that could work in any setting and with any microphone, like the one on my laptop.

I also tried DAW tools that convert audio into MIDI notes, but it didn’t give me the instant gratification of actually playing music. Further, without perfectly separated audio, it created outputs that didn’t resemble my idea. At this point, I knew there was an opportunity and began to ideate a practical solution.

My Voice as an Instrument 

After an exhaustive survey of the current marketplace, I started researching artificial intelligence in audio processing. To translate my voice into workable MIDI notes, it was clear that I needed to develop an algorithm with enough smarts to differentiate various types of drum sounds.

As a product designer, I could envision the look and feel of the experience but needed to find AI experts to develop the architecture for my idea. So I assembled a team of experienced digital signal processing engineers to build a complete machine learning neural network to accomplish this end. We call this technology, Bace.

Bace prototype showing an accurate detection of a drum kick sound.

For several months we trained Bace’s neural network. We collected sample data from men, women, children, and adults of different races and nationalities to offer the most comprehensive drum sound classifications. To account for variations of the same drum sounds between individuals, Bace has built-in training mechanics so any user can improve or retrain the model with their own sample data. These mechanics provide better results while adding a personal dimension to the technology by empowering users to define their experience.

While in our training phase, we also wanted to leverage vocal frequencies beyond drum classification for more involved melodic ideas. To achieve this, we integrated pitch detection into Bace’s algorithm. Pitch detection allows users to process and map their voices to a piano roll. 

The Future of Music Technology 

In building Bace, I’ve realized that technology can complicate the most straightforward solutions. As product designers and developers, we’ve all seen applications with unnecessarily complex interfaces, special hardware, and elaborate design for design’s sake. While novel and sophisticated, these complicated systems can take a toll on the creative process. Controlling drum sounds with my voice was such a natural idea; I could intuitively recognize its abilities and limitations as an artist.

I see the future of music-making as a way of removing complexity and empowering intuition. Bace was born from my inherent desire to capture a drum beat with my voice and translate it into real music at the moment of inspiration. By making music technology more intuitive, we will not only unleash creativity from experienced creators but provide anyone an opportunity to express themselves in new ways. That self-expression is what makes music unique, and it is our job as technologists to foster rather than complicate it.