How audio/video sync issues in video files can happen
A friend has problems redoing the voice-over in a video. The video or audio always end up too long and thus one of them is cut off to match the other), even though the recordings are both the same duration. So I wrote this little explanation and a possible solution:
Short Primer on Audio in a Computer
The purple line above is supposed to be a sound wave. The pictures at the top are video frames.
Now the computer can only do individual numbers, it can’t do wave forms, so it simply checks several times per second how high the soundwave is.
So to save the start of the wave above to your video file, it writes a bunch of pixel images for the frames and the numbers 1.3
, 2.5
, 3.0
, 3.1
and 2.8
to the file.
This is imprecise. When the computer uses these numbers to recreate the sound wave, it looks more like this:
Luckily, the human ear isn’t perfect, so if you check more often, the human ear actually won’t hear the little “steps” in the curve, it will sound close enough.
The Problem with Audio in Movies
But when it wants to play back your movie, it needs to play the right audio at the right time, it needs to align these numbers with the video frames so the lip positions line up with speech, for example.
Now, if you have 30
video frames in 1
second, and 44000
audio samples in one second, that makes 1466 2/3rd
frames (44000 ÷ 30
)
The problem here is, that you can’t have “half a sample”. So the computer has to decide what to do with the extra 2/3rds. So it usually just shifts the sample into the next frame, which very quickly adds up, leading to the audio being longer than the video:
(Like above - green is the real audio, purple is what the computer makes from it)
If you have a sample rate of 48000
samples, that evenly divides by 30
: 1600
audio samples per one video frame.
So either somewhere in your video recording, or exporting, you had 44000
where you should have had 48000
, or you had an odd framerate. E.g. NTSC TV you watch in the US uses 29.97
video frames per second, which makes it really hard to find something that evenly divides by that.
Also, many programs round that up to 30
fps, leading to issues because the picture will run just slightly slow.
What this means in practice for video editing is: Make sure that everywhere where your recording and editing software let you specify a sample rate or frame rate, they are the same, and that audio and video match (i.e. divide evenly by each other).