- From: David Singer <singer@apple.com>
- Date: Wed, 21 Oct 2015 14:17:30 +0200
- To: Philip Jägenstedt <philipj@opera.com>
- Cc: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-texttracks@w3.org" <public-texttracks@w3.org>
Hi > On Oct 21, 2015, at 13:35 , Philip Jägenstedt <philipj@opera.com> wrote: > > On Wed, Oct 21, 2015 at 12:50 PM, David Singer <singer@apple.com> wrote: >> >> People dynamically generate the files (both VTT and MP4) on the fly, so the ‘just’ in this sentence then becomes hard. > > If one is generating both a standalone WebVTT file and an MP4 file at > the same time, then the input could presumably be any format at all. > If it it is another standalone WebVTT file, is it actually hard to > collect the style blocks and put them together in the MP4 header? It > just seems to be a matter of parsing input up front, which is > generally speaking easier than creating a streaming parser and > handling the output as it comes. I wasn’t very clear. If one is, for example, live captioning, and then making chunks of a VTT file, or of an MP4 file encapsulating that VTT data, available as they are ready, then it’s obviously not possible to ‘go back’ and adjust the file header if new styles come along. People tune-in to such live streams, so we try very hard to make it true that one can get the needed information from (a) the stream setup information and (b) the stream itself, possibly starting at a random access point (e.g. a video I-frame). Style blocks interleaved in the stream forces one to roll-through a possibly long presentation, including the requirement to load it all, just to get the styling right. Obviously one could mark places that (re-)establish all the styles as sync points, but even that is hard: do I keep re-asserting all the styles I have seen, in case they are used again? Yes, the static transcoding case is easier. It is, alas, not the only one. David Singer Manager, Software Standards, Apple Inc.
Received on Wednesday, 21 October 2015 12:18:01 UTC