- From: David Singer <singer@apple.com>
- Date: Thu, 2 Apr 2009 16:37:36 -0700
- To: Jack Jansen <Jack.Jansen@cwi.nl>
- Cc: Media Fragment <public-media-fragment@w3.org>, Eric Carlson <eric.carlson@apple.com>
At 22:08 +0200 2/04/09, Jack Jansen wrote: >On 2 apr 2009, at 01:38, David Singer wrote: >>I'm not sure this really captures the way it works for MP4/MOV family files. >> >>After the UA gets a time-range request (or a starting seek >>request), for a new file: >> >>set section-requested-start 'F' to 0: >>loop { >> perform a byte-range request for F to F+ say 3K. >> do I have the start of the moov atom? yes -> exit >> look at the size of the last atom header we have; set F = offset >>of last atom + size of last atom >>} >>complete the moov atom, if we need to (one more get) >>now use the moov atom to work out what section of the media data is >>needed for the requested seek >>do I have it already? if not, get some amount of media at that >>point (for each track) >> >>Now, in theory, the loop may go round several times; 90%+ of the >>time it exits immediately as the moov atom fits into the initial >>request. Occasionally, the mdat is first and we loop twice. >> >>Usually, the moov atom is smaller than the initial request, so we >>don't need to do any completion >> >>Usually, the tracks are interleaved and one get gets us both the >>(initial) video and audio, and we're playing. > > >David, >this pseudocode gets us to moov atom, but how do you then proceed to >get the right location for playback? In Cannes, you mentioned that >there's some sort of an index in there, could you elaborate on this >a bit? Or alternatively, point to a location where this is explained? Sure. In each track, we how do the following (note that sample means a 'frame' or access unit of audio or video): in each track, there may be an edit list mapping movie time (from 0) to track time; find the segment that corresponds to the time you want, and compute its track time. (the default is 1:1) the time-to-sample table contains a compacted (run-length encoded) sample-duration for each sample, in track time. Sum them until you arrive at the sample that intersects the time you want. samples are stored in chunks; the chunk offset table gives the starting sample number, the number of samples, and the absolute file byte-offset of the first byte of that sample, for each chunk. find the chunk the sample we want is in, and its byte offset. (typically, for data loading, we stop here and just add that chunk start point to the data we need, find the chunk starts for the other tracks, and if they are close together, load a whole bunch-o-bytes from the earliest offset, such that we get all of them) there is also a table which gives (compacted again) the size of each sample; for the sample preceding the one we want, but in the same chunk, add their sizes to the chunk offset. this gives the absolute byte offset of the access unit want. we also have its size; we can now read exactly those bytes (we could have summed all the sample sizes in each chunk we are going to load, to get the chunk sizes, to further guide the read we are going to do, of course) All these structures are in the moov atom. For a well-interleaved file with a starting moov atom, an initial file read gets the moov atom, and this procedure gets us a section of the file containing the AUs we need. The moov atom gives us their timing, and we're off. If the media is video and has sync-points, at the beginning we have to back up to the preceding sync point (yes, there is a table of the sample numbers which are sync points), and then pre-roll decoding from there. Is this clearer now? This was designed for 68000 processors and 1x CD-ROM drives, wheer both computation and bandwidth (reading the CD, and particularly seeking, being slow), but that converts quite well to a world of cell-phones and wireless carriers and long round-trip times... :-) -- David Singer Multimedia Standards, Apple Inc.
Received on Thursday, 2 April 2009 23:54:08 UTC