One downside to requiring big-endianness for clip files is that it means every implementer that uses native-endianness for its packets where "native" is little endian (which is the sweeping majority of contemporary computers) must byte-swap the packets as they are read. This also means you can't use zero-copy semantics to stream clip files, which eliminates memory-mapped file I/O (which is the lowest latency way to read bytes off disk into a program on modern computers).
It's also inconsistent with the UMP specification which allows implementers to choose their endianness. To be consistent the file format should support both little and big endian, perhaps by using different magic numbers. (`SMFCLIPB`, `SMFCLIPL`, just as an idea).
There is prior-art for this, for example ELF files (the executable format on many systems including Linux) embed their encoding into the magic number at the start of the file to allow for loaders to identify if its encoded in big or little endian format. This is actually a critical feature for allowing ELF files to be read/written on platforms where either read/write performance can be traded off (for example, writing a big-endian file on a little-endian system for a target architecture that is big-endian).
ELF file reference:
https://man7.org/linux/man-pages/man5/elf.5.html