Sound, Explained · A series

Sound in three dimensions

Object-based audio, beds and objects, and the perception of height and envelopment, with the translation discipline that keeps it intact down to stereo and mono.

The interior of a large auditorium, rows of seats facing a tall ornamented screen
Public domain (CC0)

A change of address

From speakers to intent

Surround sound, as the previous chapter described it, assigns audio to fixed channels: this sound goes to the left surround, that one to the centre. It works, but the mix is welded to one speaker layout. Play it on a different arrangement and you need a new mix or a matrix to fold it down. Channel-based audio is, in the end, speaker-dependent.

Object-based audio changes the question. Instead of saying which speaker plays a sound, you describe where the sound is: a position in the room, a size, a path of motion, all carried as metadata alongside the audio. A renderer in the playback system reads that description and works out how to produce it on whatever speakers happen to be present, two or twenty, floor level or overhead. One mix, many rooms. The engineer stops addressing hardware and starts describing intent.

Two kinds of sound

What stays still, and what moves

Immersive mixes sort their material into two buckets. A bed is the stable, continuous foundation, ambience, music, the atmospheric layers that fill the space, laid down on a fixed channel configuration. An object is a discrete, locatable event, a line of dialogue, a specific effect, a sound that flies across the room, carried as positional data and placed dynamically by the renderer.

The choice is driven by what a sound needs to do, not by habit. Room tone and a music bed want to surround you evenly, so they are beds. Dialogue needs to stay precisely placed and intelligible in every possible render, so it is an object. A fly-by that has to travel through three-dimensional space is an object with motion. Get the sorting right and the renderer can do its job; get it wrong and the mix fights the system meant to translate it.

Placing a sound

The four controls of space

Putting a sound somewhere in three dimensions comes down to four perceptual controls. Position is the obvious one: the direction, left to right and now up and down as well, ideally matched to what the eye expects rather than placed at random. Distance is carried not by volume alone but by the ratio of direct sound to reverberation, the same depth cue that places a source near or far in a stereo image, now working in every direction.

Width, or spread, decides whether a sound is a sharp point or a diffuse cloud: ambiences want to be wide, dialogue wants to be tight and focused. And motion is the trajectory of an object over time, the most dramatic control and the most overused. Movement earns its place only when it serves the story; constant motion for its own sake is fatiguing and reads as a gimmick. Space, like stereo width before it, is a decision, not a default.

The reality check

It still has to fold down

However many channels a mix is built for, most people will hear it on two speakers or a pair of earbuds. So an immersive mix lives or dies by how it survives the fold-down, and that check cannot be left to the end. The working method is to prioritise: if elements must be sacrificed as the mix collapses to fewer channels, they go from the bottom of a ladder upward. Decorative ear candy goes first, then ambience, then music, then key effects, and the lead and dialogue are protected to the last. Whatever happens, the voice stays intelligible.

Two traps wait in the fold-down. The low-frequency effects channel, the same misunderstood point one from surround, is for intentional effects only; duplicate the main channels' bass into it and the summed result jumps by around ten decibels, then collapses when the channel is dropped. And wide, decorrelated ambiences that sound glorious in the full field can comb-filter and cancel when summed to stereo or mono. The discipline is to toggle between immersive, stereo, mono and headphones throughout the session, not once at the end, and to trust the collapse as the real test of the mix.

Immersive mix and translation methodology documented per delivery. Available on consultation.

Describing a space, not driving speakers

The shift from channels to objects is the same shift that runs through this whole series: stop thinking about the equipment and start thinking about the perception you are building. Place a sound where it belongs, give it the right distance and spread, move it only with reason, and let the system render it for the room it lands in. Build the space well and it follows the listener home, all the way down to two ears and a phone.

Next in the seriesWhen the room lies ← All of Sound, Explained

Explore the divisions

Article by Cymasonic Labs · Updated May 2026