- From: Glenn Adams <glenn@skynav.com>
- Date: Fri, 9 Oct 2015 10:35:28 -0600
- To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
- Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>
- Message-ID: <CACQ=j+dUqsmwCT_sGE=pARpTGjczQFt=m_AEJ0E9vpvQGYwi5w@mail.gmail.com>
we discuss these matters here in the context of ttml [1] [1] http://www.w3.org/TR/ttml1/#streaming On Fri, Oct 9, 2015 at 10:21 AM, Cyril Concolato < cyril.concolato@telecom-paristech.fr> wrote: > Le 09/10/2015 16:09, Nigel Megitt a écrit : > >> Hi Cyril, >> >> On 09/10/2015 14:55, "Cyril Concolato" >> <cyril.concolato@telecom-paristech.fr> wrote: >> >> Hi Philipp, all, >>> >>> Le 26/02/2015 03:18, Philip Jägenstedt a écrit : >>> >>>> On Thu, Feb 26, 2015 at 12:13 AM, David Singer <singer@apple.com> >>>> wrote: >>>> >>>>> On Feb 24, 2015, at 18:57 , Philip Jägenstedt <philipj@opera.com> >>>>>> wrote: >>>>>> >>>>>> I think I agree with Silvia here, a STYLE block seems more natural >>>>>> than putting it in the header. Note that we could still, if there are >>>>>> strong reasons, drop any such blocks that come after any cue. It gives >>>>>> us some flexibility with the streaming case, even if we don't use it >>>>>> now. >>>>>> >>>>> I don’t mind if it’s a block or part of the header, as long as it has >>>>> to occur before the first cue. The point is that at the moment one can >>>>> random access into a VTT file (not load it all from the beginning), >>>>> once one has the ‘header’. I don’t want to lose that. In text, one >>>>> might lose cues that have an end time that overlaps where you random >>>>> access to, but in MP4 packing we even deal with that. >>>>> >>>> What does this mean? The parser consumes all data from beginning to >>>> end as a stream. Perhaps it could be proven that if you seek to a >>>> random point and put the tokenizer+parser in a particular state then >>>> the cues that it will output will be a subset of the cues output for a >>>> sequential parse, but this isn't a property of WebVTT files I've ever >>>> even considered. >>>> >>>> I think it would be fine to require style blocks to precede any cues, >>>> but I think I'm maybe missing the actual rationale... >>>> >>> When storing a WebVTT file in an MP4 track, the WebVTT file is parsed, >>> the header is stored in a place that is not timed and the cues are >>> stored in timed places. This storage simplifies file editing (as timed >>> cues may be removed, including the first one, or added even before the >>> first one, and without caring about the header). This helps also >>> playback from non-0 time because the MP4 demux will conceptually create >>> a WebVTT file by concatenating the header followed by the cues starting >>> >> >from the requested time. When doing DASH streaming, the header is >> >>> provided upfront also, in the initialization segment, which means that >>> all WebVTT-in-4 media segments are random accessible, which is simple >>> and easy to handle. Inserting non-timed styles between cues (ie. even >>> valid for cues located before in the file) would require changes in this >>> storage and modification to associated implementations. >>> >> I may be repeating your point here - I'm not sure >> > Actually, you're not repeating, but making an interesting point. Let me > clarify (see below). > >> - but if you have a >> scheme that requires styles to be in a header and doesn't facilitate those >> styles being augmented on the fly e.g. by adding new styles, then that >> scheme doesn't work for live subtitles in the general case. It would work >> in the specific case that the style set can somehow be constrained so it >> is predefined and never changes during a presentation. From a broadcaster >> perspective I wouldn't accept that as a constraint. >> > You're right. Having a scheme that allows updating styles is interesting > in live streaming/broadcasting, especially when you don't know in advance > the styles you will use later on. > > It can be done with what I would call timed styles, ie. style that have a > time range validity, like cues today. Actually, I had proposed some time > ago to put cue styles in the cue settings directly, but it can be done also > by defining a new type of cue: style only, with a time range overlapping > the cues it applies to. > > It can also be done with untimed styles valid for the whole file when > using segment files, like the current HLS approach. In that approach, I > imagine that each style carried in a separate WebVTT file would replace > existing style. Each WebVTT file would be considered as a random access > segment. One issue though with that approach is that the concatenation of > segment files may not produce the desired result if you don't care for > selector clashes. > > Regarding the MP4 storage, the current spec does not need any modification > to store untimed styles, but indeed is not ideal for live streaming of > styles. For the storage of timed styles, an update to the MP4 spec would > probably be needed. > > Hope I'm clearer ... > > > Cyril > > -- > Cyril Concolato > Multimedia Group / Telecom ParisTech > http://concolato.wp.mines-telecom.fr/ > @cconcolato > > >
Received on Friday, 9 October 2015 16:36:17 UTC