Re: Description of the 2-ways and the 4-ways handshake

At 22:08  +0200 2/04/09, Jack Jansen wrote:
>On 2 apr 2009, at 01:38, David Singer wrote:
>>I'm not sure this really captures the way it works for MP4/MOV family files.
>>
>>After the UA gets a time-range request (or a starting seek 
>>request), for a new file:
>>
>>set section-requested-start 'F' to 0:
>>loop {
>>   perform a byte-range request for F to F+ say 3K.
>>   do I have the start of the moov atom?  yes -> exit
>>   look at the size of the last atom header we have;  set F = offset 
>>of last atom + size of last atom
>>}
>>complete the moov atom, if we need to (one more get)
>>now use the moov atom to work out what section of the media data is 
>>needed for the requested seek
>>do I have it already?  if not, get some amount of media at that 
>>point (for each track)
>>
>>Now, in theory, the loop may go round several times; 90%+ of the 
>>time it exits immediately as the moov atom fits into the initial 
>>request. Occasionally, the mdat is first and we loop twice.
>>
>>Usually, the moov atom is smaller than the initial request, so we 
>>don't need to do any completion
>>
>>Usually, the tracks are interleaved and one get gets us both the 
>>(initial) video and audio, and we're playing.
>
>
>David,
>this pseudocode gets us to moov atom, but how do you then proceed to 
>get the right location for playback? In Cannes, you mentioned that 
>there's some sort of an index in there, could you elaborate on this 
>a bit? Or alternatively, point to a location where this is explained?

Sure.

In each track, we how do the following (note that sample means a 
'frame' or access unit of audio or video):

in each track, there may be an edit list mapping movie time (from 0) 
to track time;  find the segment that corresponds to the time you 
want, and compute its track time.  (the default is 1:1)

the time-to-sample table contains a compacted (run-length encoded) 
sample-duration for each sample, in track time.   Sum them until you 
arrive at the sample that intersects the time you want.

samples are stored in chunks;  the chunk offset table gives the 
starting sample number, the number of samples, and the absolute file 
byte-offset of the first byte of that sample, for each chunk.  find 
the chunk the sample we want is in, and its byte offset.

(typically, for data loading, we stop here and just add that chunk 
start point to the data we need, find the chunk starts for the other 
tracks, and if they are close together, load a whole bunch-o-bytes 
from the earliest offset, such that we get all of them)

there is also a table which gives (compacted again) the size of each 
sample;  for the sample preceding the one we want, but in the same 
chunk, add their sizes to the chunk offset.  this gives the absolute 
byte offset of the access unit want.

we also have its size;  we can now read exactly those bytes

(we could have summed all the sample sizes in each chunk we are going 
to load, to get the chunk sizes, to further guide the read we are 
going to do, of course)

All these structures are in the moov atom.  For a  well-interleaved 
file with a starting moov atom, an initial file read gets the moov 
atom, and this procedure gets us a section of the file containing the 
AUs we need.  The moov atom gives us their timing, and we're off.

If the media is video and has sync-points, at the beginning we have 
to back up to the preceding sync point (yes, there is a table of the 
sample numbers which are sync points), and then pre-roll decoding 
from there.



Is this clearer now?

This was designed for 68000 processors and 1x CD-ROM drives, wheer 
both computation and bandwidth (reading the CD, and particularly 
seeking, being slow), but that converts quite well to a world of 
cell-phones and wireless carriers and long round-trip times... :-)

-- 
David Singer
Multimedia Standards, Apple Inc.

Received on Thursday, 2 April 2009 23:54:08 UTC