Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT)

we discuss these matters here in the context of ttml [1]

[1] http://www.w3.org/TR/ttml1/#streaming

On Fri, Oct 9, 2015 at 10:21 AM, Cyril Concolato <
cyril.concolato@telecom-paristech.fr> wrote:

> Le 09/10/2015 16:09, Nigel Megitt a écrit :
>
>> Hi Cyril,
>>
>> On 09/10/2015 14:55, "Cyril Concolato"
>> <cyril.concolato@telecom-paristech.fr> wrote:
>>
>> Hi Philipp, all,
>>>
>>> Le 26/02/2015 03:18, Philip Jägenstedt a écrit :
>>>
>>>> On Thu, Feb 26, 2015 at 12:13 AM, David Singer <singer@apple.com>
>>>> wrote:
>>>>
>>>>> On Feb 24, 2015, at 18:57 , Philip Jägenstedt <philipj@opera.com>
>>>>>> wrote:
>>>>>>
>>>>>> I think I agree with Silvia here, a STYLE block seems more natural
>>>>>> than putting it in the header. Note that we could still, if there are
>>>>>> strong reasons, drop any such blocks that come after any cue. It gives
>>>>>> us some flexibility with the streaming case, even if we don't use it
>>>>>> now.
>>>>>>
>>>>> I don’t mind if it’s a block or part of the header, as long as it has
>>>>> to occur before the first cue. The point is that at the moment one can
>>>>> random access into a VTT file (not load it all from the beginning),
>>>>> once one has the ‘header’.  I don’t want to lose that.  In text, one
>>>>> might lose cues that have an end time that overlaps where you random
>>>>> access to, but in MP4 packing we even deal with that.
>>>>>
>>>> What does this mean? The parser consumes all data from beginning to
>>>> end as a stream. Perhaps it could be proven that if you seek to a
>>>> random point and put the tokenizer+parser in a particular state then
>>>> the cues that it will output will be a subset of the cues output for a
>>>> sequential parse, but this isn't a property of WebVTT files I've ever
>>>> even considered.
>>>>
>>>> I think it would be fine to require style blocks to precede any cues,
>>>> but I think I'm maybe missing the actual rationale...
>>>>
>>> When storing a WebVTT file in an MP4 track, the WebVTT file is parsed,
>>> the header is stored in a place that is not timed and the cues are
>>> stored in timed places. This storage simplifies file editing (as timed
>>> cues may be removed, including the first one, or added even before the
>>> first one, and without caring about the header). This helps also
>>> playback from non-0 time because the MP4 demux will conceptually create
>>> a WebVTT file by concatenating the header followed by the cues starting
>>>
>> >from the requested time. When doing DASH streaming, the header is
>>
>>> provided upfront also, in the initialization segment, which means that
>>> all WebVTT-in-4 media segments are random accessible, which is simple
>>> and easy to handle. Inserting non-timed styles between cues (ie. even
>>> valid for cues located before in the file) would require changes in this
>>> storage and modification to associated implementations.
>>>
>> I may be repeating your point here - I'm not sure
>>
> Actually, you're not repeating, but making an interesting point. Let me
> clarify (see below).
>
>>   - but if you have a
>> scheme that requires styles to be in a header and doesn't facilitate those
>> styles being augmented on the fly e.g. by adding new styles, then that
>> scheme doesn't work for live subtitles in the general case. It would work
>> in the specific case that the style set can somehow be constrained so it
>> is predefined and never changes during a presentation. From a broadcaster
>> perspective I wouldn't accept that as a constraint.
>>
> You're right. Having a scheme that allows updating styles is interesting
> in live streaming/broadcasting, especially when you don't know in advance
> the styles you will use later on.
>
> It can be done with what I would call timed styles, ie. style that have a
> time range validity, like cues today. Actually, I had proposed some time
> ago to put cue styles in the cue settings directly, but it can be done also
> by defining a new type of cue: style only, with a time range overlapping
> the cues it applies to.
>
> It can also be done with untimed styles valid for the whole file when
> using segment files, like the current HLS approach. In that approach, I
> imagine that each style carried in a separate WebVTT file would replace
> existing style. Each WebVTT file would be considered as a random access
> segment. One issue though with that approach is that the concatenation of
> segment files may not produce the desired result if you don't care for
> selector clashes.
>
> Regarding the MP4 storage, the current spec does not need any modification
> to store untimed styles, but indeed is not ideal for live streaming of
> styles. For the storage of timed styles, an update to the MP4 spec would
> probably be needed.
>
> Hope I'm clearer ...
>
>
> Cyril
>
> --
> Cyril Concolato
> Multimedia Group / Telecom ParisTech
> http://concolato.wp.mines-telecom.fr/
> @cconcolato
>
>
>

Received on Friday, 9 October 2015 16:36:17 UTC