Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Nigel Megitt on 2015-10-09 (public-texttracks@w3.org from October 2015)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Fri, 9 Oct 2015 14:09:35 +0000
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <D23D884E.295C8%nigel.megitt@bbc.co.uk>

Hi Cyril,

On 09/10/2015 14:55, "Cyril Concolato"
<cyril.concolato@telecom-paristech.fr> wrote:

>Hi Philipp, all,
>
>Le 26/02/2015 03:18, Philip Jägenstedt a écrit :
>> On Thu, Feb 26, 2015 at 12:13 AM, David Singer <singer@apple.com> wrote:
>>>
>>>> On Feb 24, 2015, at 18:57 , Philip Jägenstedt <philipj@opera.com>
>>>>wrote:
>>>>
>>>> I think I agree with Silvia here, a STYLE block seems more natural
>>>> than putting it in the header. Note that we could still, if there are
>>>> strong reasons, drop any such blocks that come after any cue. It gives
>>>> us some flexibility with the streaming case, even if we don't use it
>>>> now.
>>> I don’t mind if it’s a block or part of the header, as long as it has
>>>to occur before the first cue. The point is that at the moment one can
>>>random access into a VTT file (not load it all from the beginning),
>>>once one has the ‘header’.  I don’t want to lose that.  In text, one
>>>might lose cues that have an end time that overlaps where you random
>>>access to, but in MP4 packing we even deal with that.
>> What does this mean? The parser consumes all data from beginning to
>> end as a stream. Perhaps it could be proven that if you seek to a
>> random point and put the tokenizer+parser in a particular state then
>> the cues that it will output will be a subset of the cues output for a
>> sequential parse, but this isn't a property of WebVTT files I've ever
>> even considered.
>>
>> I think it would be fine to require style blocks to precede any cues,
>> but I think I'm maybe missing the actual rationale...
>When storing a WebVTT file in an MP4 track, the WebVTT file is parsed,
>the header is stored in a place that is not timed and the cues are
>stored in timed places. This storage simplifies file editing (as timed
>cues may be removed, including the first one, or added even before the
>first one, and without caring about the header). This helps also
>playback from non-0 time because the MP4 demux will conceptually create
>a WebVTT file by concatenating the header followed by the cues starting
>from the requested time. When doing DASH streaming, the header is
>provided upfront also, in the initialization segment, which means that
>all WebVTT-in-4 media segments are random accessible, which is simple
>and easy to handle. Inserting non-timed styles between cues (ie. even
>valid for cues located before in the file) would require changes in this
>storage and modification to associated implementations.

I may be repeating your point here - I'm not sure - but if you have a
scheme that requires styles to be in a header and doesn't facilitate those
styles being augmented on the fly e.g. by adding new styles, then that
scheme doesn't work for live subtitles in the general case. It would work
in the specific case that the style set can somehow be constrained so it
is predefined and never changes during a presentation. From a broadcaster
perspective I wouldn't accept that as a constraint.

Kind regards,

Nigel

>
>Regards,
>Cyril
>
>-- 
>Cyril Concolato
>Multimedia Group / Telecom ParisTech
>http://concolato.wp.mines-telecom.fr/

>@cconcolato
>
>

Received on Friday, 9 October 2015 14:10:09 UTC