fbpx
Skip to main content

Building a USB MIDI 2.0 Device – Part 1


By Andrew Mee in collaboration with the OS API Working Group

 

USB MIDI 2.0 was released by the USB-IF in June 2020, with Apple adding support within CoreMIDI in October 2021 and Google added support in Android in August 2022. At the time of writing, Microsoft has announced upcoming support for MIDI 2.0 and now on a public Github, and also patches have been submitted by ALSA for inclusion in Linux Kernel 6.5. An update to the MIDI 2.0 UMP specification was approved in the first half of 2023.
For a more complete timeline see https://www.midi.org/midi-articles/detailed-timeline-of-midi-2-0-developments-since-january-2020

This technical guide to building a USB MIDI 2.0 device is the first in a series of articles targeted specifically to device developers.
For musicians, please see this article: https://www.midi.org/midi-articles/what-musicians-and-artists-need-to-know-about-midi-2-0

This series (based on the work of the OS API Working Group of the MIDI Association) focuses on configuring the USB descriptors to provide the best experience for your users. It shows you how to handle Group Terminal Blocks and Function Blocks, Multiple UMP Endpoints and compatibility with USB Hosts that can only handle USB MIDI 1.0, as well MIDI 1.0 Applications.

This guide assumes that the reader is familiar with the following specifications:

  • Universal MIDI Packet (UMP) Format and MIDI 2.0 Protocol v1.1
  • MIDI Capability Inquiry (MIDI-CI) v1.2
  • Universal Serial Bus Device Class Definition for MIDI Devices v2.0 (USB MIDI 2.0)

 


 

Planning your MIDI 2.0 Device

 

In MIDI 1.0 most Devices present an IN port and an OUT port and that is all that is required.
In MIDI 2.0 there are several factors to consider:

  • What are the details of your Device?
    This includes the product name and other similar details.
  • How many functions does your Device have?
    Initially think of functions as destinations and/or sources in your Device. For example a simple single channel mono synthesizer has a tone generator – this is one function. However this hardware Device may also have external MIDI IN/OUT Din ports – this could be classed as more functions. A Workstation may have many more functions.
    Note: Ultimately, these functions are represented by Function Blocks, which should drive your Group Terminal Block design. But for purposes of this article, we’ll cover only the USB descriptors, and the Group Terminal Blocks
  • How many channels are needed for each function?
    With the ability to utilize more than 16 Channels a multitimbral tone generator may have 32 Channels (or indeed up to 256 channels!)
  • Do you want/need the user to access all 256 channels or just a subset?
    For example maybe the tone generator can be accessed on any of the 256 Channels
  • How do you want these functions accessed when using MIDI 1.0?
    This is explained in greater detail below.
  • What MIDI 2.0 features are used for this function?
    MIDI-CI, MIDI 2.0 Protocol, JR Timestamps etc

 


 

Design of a simple Desktop Monosynth

 

Let’s imagine we have a simple single channel desktop monosynth. We want to use MIDI 2.0 Protocol where we can because the parameters (e.g. filter cutoff frequency) benefit from having more than 128 steps. While the MIDI 2.0 Protocol boasts a massive improvement in resolution and capabilities and allows us to future-proof the product, we also need to have MIDI 1.0 compatibility for both older OS’s and MIDI 1.0 Applications.

 

 

 

 

First, we start gathering the details of the synth. These values will be repeated into several different fields. They have been color-coded so you can easily refer to the source of information.

 

Detail Value String Rules
Manufacturer Name “ACME Enterprises” UTF16, Max Length: 254 bytes
Product Name “ACMESynth” UTF-8, Max Length: 98 bytes
Product Instance Id “ABCD12345”

The Product Instance Id of a device is any
unique identifier the Device has. This may be
Microcontroller Id or a built in MAC address.
Please read Pete Brown’s excellent article on why this is critical.

ASCII, Max Length: 42 bytes

don’t include characters that are:

Less than or equal to 0x20

Greater than 0x7F

Equal to 0x2C (‘,’)

Protocol MIDI 2.0 Protocol (with a fallback to MIDI 1.0 Protocol)
Function 1
Function 1 Name “Monosynth” UTF-8, Max Length: 98 bytes
Function 1 Channels Needed 1 Channel at a time used bidirectionally

 


 

String Values

 

 The string values in this table will be used in USB descriptors, UMP messages, and MIDI-CI messages. Each of these systems have limitations that should be adhered to. The table above provides a set of rules that best suits all strings used in a new device.

 

  • USB String Descriptors use UNICODE UTF16LE encodings, not NULL-terminated up to 254 characters
  • UMP Endpoint Name Notification and Function Block Name Notification Messages are UTF-8 up to 98 characters
  • UMP Product Instance Id Notification Message is ASCII up to 42 characters.

It is recommended that Product Instance Id is used as the USB iSerial value. For compatibility with Windows it is suggested iSerial numbers don’t contain characters that are:

  • Less than or equal to 0x20 (‘ ‘)
  • Greater than 0x7F
  • Equal to 0x2C (‘,’)

 


 

USB Descriptors

 

In USB MIDI 1.0, devices can present a USB IN Data Endpoint and/or a USB OUT Data Endpoint with up to 16 virtual MIDI cables each. Note that USB IN and USB OUT refer to data direction from the point of view of the USB Host. Each virtual MIDI cable can be considered equivalent to a MIDI DIN jack with streaming MIDI 1.0 for up to 16 channels each. In USB MIDI 2.0, the device can present a single USB IN Data Endpoint and/or a single USB OUT Data Endpoint that represents a single Universal MIDI Packet (UMP) data stream. It is strongly recommended that an IN/OUT Data Endpoint pair is presented to create a bi-directional UMP Endpoint to fully take advantage of MIDI 2.0.

 

At this point we start building the USB Descriptors. Developers should ensure that the following fields are filled out with the information above.

 

When defining the USB descriptors we can see that this Device only needs to set-up a single Interface.

 

Note some USB details in this document use an id reference for a separate string descriptor. The strings are shown for brevity. If the value is omitted, the id should be set to 0, not to an entry with a blank string.

 

Detail Product Detail Value Id of String Descriptor Referenced Value
iManufacturer Manufacturer Name “ACME Enterprises”
iProduct Product Name “ACMESynth”
iSerialNumber Product Instance Id “ABCD12345”
iInterface Model Name* “ACMESynth”

 

*More complicated setups with multiple interfaces (and multiple UMP endpoints) will be discussed in a followup article. For a simple Device the iInterface and the iProduct can be the same.

 

 

 

 


 

MIDI 1.0 Class Specific Descriptors (on Alternate Setting 0)

 

A USB MIDI 2.0 Device should include a set of USB MIDI 1.0 Class Descriptors so that when it is plugged into a Host which does not understand MIDI 2.0, it can operate as a MIDI 1.0 Device.

When declaring MIDI 1.0 Class Specific Descriptors, you should provide a text string name for all Embedded MIDI Jacks.

 

Detail Product Detail Value Id of String Descriptor Referenced Value
iJack Function 1 Name “Monosynth”

 

Most Host MIDI 1.0 Class drivers do not collect information about the topology inside the MIDI Function, such as External MIDI Jacks or Elements.

 


 

MIDI 2.0 Descriptors (on Alternate Setting 1)

 

For the best compatibility on OS’s, each UMP Endpoint is represented by a single Interface. The Interface has an In and an Out USB Endpoint that represents a bidirectional UMP Endpoint.

Devices expose their functions and topology using Function Blocks regardless of transport. USB MIDI 2.0 has the additional requirement of Group Terminal Blocks which are declared in the Device descriptors and designed in consideration of the device’s Function Blocks.

Each USB Endpoint declares the Group Terminal Block Id’s used. The USB Device has a list of Group Terminal Block descriptors that match these Id’s.

In our example, the Monosynth function only uses one channel, so we only need to declare the use of one UMP Group. While there are different ways of declaring Group Terminal Blocks we will look at just one way first and revisit with different configurations with the pros and cons of each at another time.

Option 1: Declare a Single Group Terminal Block on Group 1 with a length of 1 Group

For simple Devices like our Monosynth that only connect over USB this may be the most straightforward way of connecting to a computer and provides the best backwards compatibility to MIDI 1.0 Applications.

The Group Terminal Block should have the following settings:

 

Detail Value String Rules
bGrpTrmBlkID Block Id 1
bGrpTrmBlkType Block Type 0x00 – bidirectional
nGroupTrm Starting Group 0x00 – Group 1
nNumGroupTrm Number of Groups Spanned 1
iBlockItem Function 1 Name Id of String Descriptor Referenced Value – “Monosynth”
bMIDIProtocol Block Protocol* 0x11 (MIDI 2.0 Protocol)

 

 USB Endpoints under the Interface should have the following values:

 

Detail Value String Rules
bNumGrpTrmBlock Number of GTB’s 1
baAssoGrpTrmBlkID Block Id 1

 

With the descriptors now defined, let’s observe the interaction with an OS that supports MIDI 2.0.

While MIDI 2.0 Protocol is declared in the Monosynth Group Terminal Block, a host Application may send MIDI 1.0 Channel Voice Message either intentionally or accidentally. In UMP 1.1 Stream Configuration messages may also be used to switch Protocols. To ensure the best compatibility with incoming messages a MIDI 2.0 Device supporting MIDI 2.0 Protocol should also handle and process MIDI 1.0 Channel Voice messages. We will discuss handling this in a follow up article.

 


 

OS’s That Support MIDI 2.0

 

Accessing MIDI 2.0 Devices in Software

 

MIDI 2.0 applications should, where possible, connect to the connection labelled “MIDI 2.0” (as seen below). MIDI 1.0 applications are generally only able to connect to the “Monosynth” connection and talk using MIDI 1.0 Protocol. The OS converts these MIDI 1.0 byte stream messages to UMP format between the Device and the application.

 


 

MAC OSX 14+

 

 

 

 

In Mac OSX 14+ (developer release) this looks like the following:

 

 

 

 

 

 

 

Hint: More detailed information can be seen in Mac OSX MIDI Studio by selecting List View. 

 

 

 

 

 

 

Note: OSX has supported USB MIDI 2.0 since OSX 11. Prior to OSX 14 (developer release) only the “Monosynth” entity is shown.

 


 

Linux (6.5+)

 

In Linux (upcoming Kernel 6.5, ALSA 1.2.x+) this shows up as:

 

 

 

 


 

Android 13+

 

Android 13+ currently connects to the USB MIDI Interface and can use either the USB MIDI 1.0 function on Alternate Setting #0 or use the USB MIDI 2.0 function on Alternate Setting #1 on a per application basis. When apps call midiManager.getDevicesForTransport( MidiManager.TRANSPORT_UNIVERSAL_MIDI_PACKETS), they see the USB MIDI 2.0 device as a MIDI 2.0 device. If they call midiManager.getDevicesForTransport( MidiManager.TRANSPORT_MIDI_BYTE_STREAM), they see the device as a MIDI 1.0 device. A device can be opened as only one of the two modes at once.

 

See https://developer.android.com/reference/android/media/midi/package-summary for more information. https://github.com/android/midi-samples contains some sample applications developers can test with.

 


 

OS’s That Don’t (Currently) Support MIDI 2.0

 

In OS’s that don’t yet support MIDI 2.0, the USB MIDI 1.0 function may be loaded to expose MIDI 1.0 Ports as declared in the MIDI 1.0 Class Specific Descriptors (on Alternate Setting 0). Currently in Windows this looks like:

 

 

 

 

While in current versions of Linux this looks like:

 

 

 

 


 

Where to next…?

 

These USB settings form the beginnings of a USB MIDI 2.0 Device.
In the next part of this series we look at recommended UMP Endpoint messages and how Function Blocks interact with Group Terminal Blocks to extend the usability of MIDI 2.0. We will look at other options for having more functions in your Device and how best to support them.