Still Field StudioStill Field Studio

October 1, 2025 · 4 min read

When to cut to picture, when to cut to sound

Every cut is two cuts: where the picture changes and where the sound changes. A studio note on when to land them together and when to pull them apart.

When to cut to picture, when to cut to sound

Every cut is really two cuts. There is the frame where the picture changes, and there is the moment where the sound changes, and nothing in the medium requires them to be the same frame. Most of the time they are, because that is the path of least resistance — the editor moves a clip and the audio attached to it moves with it. But the join between picture and sound is a decision, whether or not anyone made it on purpose.

We spend a surprising amount of any cut arguing about where the sound should land relative to where the eye is being sent. It sounds like hair-splitting until you watch a scene play both ways. So this is a working note on the choice: when to let picture and sound land together, when to pull them apart, and why we have stopped treating the synchronised cut as a default.

The default that isn't a default

Landing every sound exactly on the picture cut is the obvious move and frequently the weaker one. Sound on Sound makes the point plainly in its writing on sound design for visual media: cutting sound in time with the visual edit seems logical, but introducing a sound early, or letting it run across the edit for a second or two, is often what makes a scene land.

The example that travels well is a door. We hear the handle turn and the door swing before we see the room, so that when the picture finally cuts, the door is already open. The sound got its moment to be introduced before it had to compete with anything else. The audience registered it. Had the door noise arrived on the same frame as the picture, it would have been one of several things asking for attention at once, and it would have been the first to lose.

So the working question is not "does the sound match the picture" but "what does the audience need to register here, and does the sound serve that by landing on the cut, ahead of it, or behind it." That is a different question, and it produces different cuts.

The vocabulary of a deliberate offset

The craft already has names for pulling the two cuts apart, which is a good sign that editors have been doing it for a long time. IndieWire's glossary of postproduction terms lays out the common ones. An L-cut carries audio from one scene across the picture edit into the next. A J-cut does the reverse — the next scene's sound arrives before its picture does, so we hear where we are going before we see it.

At the scale of a whole scene, the same instinct becomes an audio bridge: a line of dialogue, an effect, or a music cue that runs across a visual transition and stitches two places together. None of these are tricks. They are the grammar of choosing the sound cut independently of the picture cut, and naming them is what lets a room full of people argue about the right one instead of nudging clips until something feels off.

When the beat leads

There is a case where the whole dependency reverses, and it catches people out. When a strongly rhythmic piece of music governs a sequence, the picture is cut to the beat rather than the music being written to a locked picture. Sound on Sound describes this directly in its piece on writing music for picture: trying to score to a finished edit can be a nightmare, so the picture editor cuts the images to the music once the track is in the timeline.

The general principle underneath it is that the editor decides, moment to moment, which cue takes priority — the visual one or the musical one — rather than assuming the two streams are locked. The counter-example keeps it honest. In a dialogue scene, forcing every cut onto a musical grid would flatten the performance; the timing the actors found has to lead, and the music yields to it. Knowing which one is in charge is most of the job.

When the cut is one decision, not two

Any of this works because the cut is not two parallel streams that happen to share a timecode. Walter Murch has spent a career arguing the point. IndieWire's profile of him notes that one of the goals at American Zoetrope was to break down the barriers between picture editing, sound editing, and mixing — to treat them as one continuous decision. He goes further in his talks on what he calls nodal editing, where the cut is a node at which image and sound meet rather than two things that must coincide.

That framing is why we like to keep both streams in one place while we work, and it is part of why we make the case for cutting on the original timeline. When the picture and the sound live on the same surface, the offset between them stops being an accident of conform and becomes something you can hear and adjust in the same pass.

A closing note

The sync point is a choice. Our only firm rule: make it deliberately, rather than inherit it from wherever the clip happened to fall. That decision is easier to make well when the sound was considered early — it is why we scout the sound before the picture, so that the offset between what you see and what you hear is planned on location, not patched in the mix.