I’ve got a fever, and the only prescription is more machine-learning based audio analysis.
The bulk of Tuesday’s WWDC session, “
Presenter Jon Huang of Apple’s audio team demonstrates a built-in audio recognition model that allows his Mac to dynamically recognize different sounds as they occur around it: talking, music, elements of the music like vocals and guitars, Huang snapping along to the music, pouring tea, stirring the glass, and more. (The Mac is literally annotating every sound that’s happening in the room—it’s a very cool demo, and I was immediately struck by the potential accessibility power of this technology.)
Following up Huang’s demo is a pretty nice one by audio software engineer Kevin Durand: he uses Shortcuts on a Mac running macOS Monterey to process a folder full of movies, looking for any that contain the sound of a cowbell—and then clipping out that movie and saving it to a new location. It’s a great demo of the utility of Shortcuts and of the speed of Macs running Apple silicon, and sure enough, in moments Durand’s shortcut has found one video—of Huang roller-skating backward while striking a cowbell.
Shortcuts doesn’t have audio classification built into it (yet?), but Durand has built a simple app using Apple’s new SoundAnalysis APIs that identifies whether a particular sound is found inside a particular file. It’s a good way to demonstrate these APIs to developers, but I think it’s also a good example of what Shortcuts enables on the Mac—Durand’s app doesn’t really even need an interface to be useful, because it’s lending all its power to Shortcuts, where users can add it in to whatever workflows they want to build.
But in the meantime, maybe Apple should try to build actions like Durand’s right into Shortcuts. These SoundAnalysis APIs are really impressive, and Shortcuts provides a way to put that kind of power into the hands of users without anyone standing in the middle.