W3C home > Mailing lists > Public > public-media-fragment@w3.org > January 2009

Why we need SMPTE timecodes to do frame-accurate processing

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Wed, 14 Jan 2009 14:25:48 +0100
Message-Id: <4B13E5A0-C8CE-4349-9732-D4C630BE4588@cwi.nl>
To: Media Fragment <public-media-fragment@w3.org>
As promised, here's a short explanation why we having only seconds (or  
microseconds, etc)-based time is not good enough for frame- 
accurateselection of video content.

SMPTE timecodes come in a number of flavors, among them SMPTE-24 (from  
film, originally, 24fps) and SMPTE-25 (from PAL television). Most of  
these flavors are easy to convert from and to linear time, because  
they are themselves linear and monotonous. I.e. SMPTE-24 hh:mm:ss:ff  
can be converted to seconds by doing hh*3600+mm*60+ss+ff/24.

The problem starts with NTSC timecodes. NTSC is commonly thought of as  
30 frames per second, but its actually 29.97 frames per second. The  
difference is There are two common ways to solve this issue:

1. smpte-30 ignores the problem, it just says "there's 30 frames in  
every second". So, consecutive frames are numbered consecutively.  
Conversion between timestamps and (milli)seconds is just as easy as  
for smpte-24. However, there is a playback problem: 30 frames should  
play back not in 1.000 second but in 1.001 second. Ignoring this is  
not an option, especially not when there is an audio track too: if we  
have a 44Khz audio track we should play back 44044 audio samples in  
the same time as we play back 30 video frames. If we don't do this  
then by the time we're 15 minutes into a presentation, audio and video  
will be out-of-sync by 1 second. Not everyone can spot off-by-one- 
frame sync errors, but off-by-one-second is clearly too much:-)

2. smpte-30drop fixes the problem with a solution similar to leap  
years: at the beginning of every minute *except if the minute is  
divisible by 10* there are no frames 00 and 01. So, it's not frames  
that are dropped, but numbers. So, after frame 00:00:59:29 we get  
frame 00:01:00:02. But, after frame 00:09:59:29 we get 00:10:00:00.
Now we can blissfully ignore the audio/video sync problem and over the  
course of 10 minutes audio and video will slowly drift apart, but at  
most 2 frames. Then they'll be yanked in sync again. But, in the  
process we've lost our ability to do an easy conversion between  
timecodes and milliseconds. We could conceivably still do the  
calculation by working in intervals of 10 minutes, but that is  
complicated. Moreover, officially the only timecodes that are dropped  
are hh:mm:00:00 and hh:mm:00:01 for and "mm" that is not a multiple of  
10, but practically this isn't always the case. There's lots of stuff  
out there that is blissfully unaware of the intricacies of drop codes,  
so when it temporally crops a 1-second piece of video from, say,  
00:00:09:10 you'll find timecodes 00:00:00:20 and 00:00:00:21 missing  
in the resulting video:-( All in all, in this case, timecodes are more  
like identifiers than numbers.

Here's some links on the subject:

Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Received on Wednesday, 14 January 2009 13:26:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:52:41 UTC