Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Philip Jägenstedt on 2015-10-20 (public-texttracks@w3.org from October 2015)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 20 Oct 2015 13:40:46 +0200
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CAMQvoCmZUib4VstTttSUppH8edL2ubW+tai1i0brPEHQfhiwTQ@mail.gmail.com>

On Fri, Oct 9, 2015 at 3:55 PM, Cyril Concolato <
cyril.concolato@telecom-paristech.fr> wrote:

> Hi Philipp, all,
>
> Le 26/02/2015 03:18, Philip Jägenstedt a écrit :
>
>> On Thu, Feb 26, 2015 at 12:13 AM, David Singer <singer@apple.com> wrote:
>>
>>>
>>> On Feb 24, 2015, at 18:57 , Philip Jägenstedt <philipj@opera.com> wrote:
>>>>
>>>> I think I agree with Silvia here, a STYLE block seems more natural
>>>> than putting it in the header. Note that we could still, if there are
>>>> strong reasons, drop any such blocks that come after any cue. It gives
>>>> us some flexibility with the streaming case, even if we don't use it
>>>> now.
>>>>
>>> I don’t mind if it’s a block or part of the header, as long as it has to
>>> occur before the first cue. The point is that at the moment one can random
>>> access into a VTT file (not load it all from the beginning), once one has
>>> the ‘header’.  I don’t want to lose that.  In text, one might lose cues
>>> that have an end time that overlaps where you random access to, but in MP4
>>> packing we even deal with that.
>>>
>> What does this mean? The parser consumes all data from beginning to
>> end as a stream. Perhaps it could be proven that if you seek to a
>> random point and put the tokenizer+parser in a particular state then
>> the cues that it will output will be a subset of the cues output for a
>> sequential parse, but this isn't a property of WebVTT files I've ever
>> even considered.
>>
>> I think it would be fine to require style blocks to precede any cues,
>> but I think I'm maybe missing the actual rationale...
>>
> When storing a WebVTT file in an MP4 track, the WebVTT file is parsed, the
> header is stored in a place that is not timed and the cues are stored in
> timed places. This storage simplifies file editing (as timed cues may be
> removed, including the first one, or added even before the first one, and
> without caring about the header). This helps also playback from non-0 time
> because the MP4 demux will conceptually create a WebVTT file by
> concatenating the header followed by the cues starting from the requested
> time. When doing DASH streaming, the header is provided upfront also, in
> the initialization segment, which means that all WebVTT-in-4 media segments
> are random accessible, which is simple and easy to handle. Inserting
> non-timed styles between cues (ie. even valid for cues located before in
> the file) would require changes in this storage and modification to
> associated implementations.
>

Isn't this just a matter of parsing the whole WebVTT file into memory
before trying to mux it into MP4? If you just collect all the style blocks
and put them in the header, is there still a problem?

Philip

Received on Tuesday, 20 October 2015 11:41:15 UTC