Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT)

Le 22/10/2015 15:51, Nigel Megitt a écrit :
> It would also be possible to take the same approach with VTT as we have
> taken with TTML, which is that you have a sequence of independent
> documents each of which contains the styling etc needed to display itself,
> for whatever time period applies. Then you have something deliverable that
> will work, and you can separate out the problem of creating a single long
> document that contains "all the previous documents' content" into a
> different processing task.
Exactly. That's what I meant by the HLS approach with self-contained 
WebVTT segments, including with styles.
> If you go down the route of timed styles then
> you're almost at that point anyway.
Yes the two approaches are very close. I'm not yet decided on one or the 
other. The Style header approach seems simple. The timed styles approach 
seems more flexible. In the Style header approach changing style indeed 
requires creating a new file. In DASH this means creating a new segment. 
If the segment is plain VTT (à la HLS) that's ok. It's problematic in 
the MP4 case, as the header is not in the media segment but in the 
initialization segment, so this means creating a new period when styles 
change :(

Cyril
>
> Nigel
>
>
> On 22/10/2015 14:40, "Cyril Concolato"
> <cyril.concolato@telecom-paristech.fr> wrote:
>
>> Le 22/10/2015 13:36, Philip Jägenstedt a écrit :
>>> On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato
>>> <cyril.concolato@telecom-paristech.fr> wrote:
>>>> Le 21/10/2015 15:39, Philip Jägenstedt a écrit :
>>>>> In the DASH/MP4/VTT software stack, is WebVTT the input or the
>>>>> output, and
>>>>> is it a file or a stream? AFAICT, the only issue would be with a
>>>>> WebVTT
>>>>> input stream (using the syntax in the spec, not any other framing)
>>>>> with
>>>>> STYLE blocks at the end, but since streaming standalone WebVTT
>>>>> doesn't exist
>>>>> yet I'm uncertain if that's really what you mean.
>>>> These are the good questions. It is currently possible to have a
>>>> never-ending WebVTT file being produced live, delivered over HTTP (e.g.
>>>> using chunked transfer encoding). Such WebVTT 'stream' cannot easily be
>>>> consumed by a browser today because the Streams API is not there yet,
>>>> but it
>>>> will be available in the future. Other (non-browser) WebVTT
>>>> implementations
>>>> can already use that today. This might require careful creation of
>>>> cues to
>>>> insure that each point is a random access, but that's possible today.
>>>> Several services can be done based on that: think of a WebRadio with
>>>> subtitling. Regarding MP4 packagine, an implementation could consume
>>>> such
>>>> stream and produces MP4 segments on the fly, if needed.
>>>>
>>>> For those implementations, if a new untimed style header would arrive
>>>> in the
>>>> input WebVTT stream and if such style would be defined to have effects
>>>> on
>>>> the whole 'file', i.e. including to cues prior in the 'file', then
>>>> playing
>>>> the live stream versus recording the stream and then playing a file
>>>> would
>>>> not have the same result. That would be problematic. That's why I
>>>> think that
>>>> styles should either be in the header (with semantics that they are
>>>> valid
>>>> for the whole file and without the ability to be in between cues) or
>>>> as a
>>>> timed block with a well defined time validity (like cues), or as
>>>> settings of
>>>> a cue. For the last two options, it really looks like WebVTT would
>>>> become a
>>>> multiplex of two types of timed data (cue and styles), I'm not sure we
>>>> should go in this direction and if a separate style file/stream
>>>> wouldn't be
>>>> better.
>>> Do you have a pointer to such a never-ending WebVTT file deployed on
>>> the public web?
>> No I don't, but that does not mean that it does not exist nor that we
>> should break such scenario.
>>> I honestly didn't think they would exist yet.
>>>
>>> To be pendantic, the reason that never-ending WebVTT files don't work
>>> in browsers isn't because of the Streams API, but because the media
>>> element's readyState cannot reach HAVE_FUTURE_DATA until the text
>>> tracks are ready:
>>>
>>> https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tra
>>> cks-are-ready
>>>
>>> This is what the spec bug is about, some mechanism to unblock
>>> readyState before text track parsing has finished:
>>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029
>> Sorry, I wasn't clear. I know about that bug. I was already assuming
>> that a web app would fetch the WebVTT (using XHR or fetch, retrieving
>> the text content as a stream), parse it and produce the cues in JS, not
>> at all using the native browser support, because of that exact bug.
>>> Anyway, letting the parser discard style blocks after any cues until
>>> we've figured out the live streaming issues is OK with me. However,
>>> let's spell out the implications of keeping this restriction for live
>>> streams:
>> I agree that it's the right approach. We should be aware of the
>> limitations of such approach.
>>> If you don't know all of the style up front,
>> I agree that "if you don't know all of the style up front" you have a
>> problem to solve. Nigel already pointed that out, as being useful in
>> broadcast where you don't necessarily know in advance all your styles.
>> To me, there are 2 main approaches: using timed styles or refreshing
>> untimed styles.
>>
>> By timed styles, I imagine something like:
>>
>> 00:01:00.000 --> 00:02:00.000 type:style
>> .myRedClass {
>>     color: red;
>> }
>> .myGreenClass {
>>     color: green;
>> }
>>
>> 00:01:00.000 --> 00:01:30.000
>> <v.myGreenClass>Some green text
>>
>> 00:01:20.000 --> 00:02:00.000
>> <v.myRedClass>Some red text
>>
>> A cue of with a 'type' settings whose value is 'style' carries style
>> content not text content. This has the advantage of giving precise
>> timing for the styles, and we can force styles to appear in start time
>> order (like cues) and before a cue that has a similar start time. There
>> are probably problems with the syntax (blank lines in CSS, I did not
>> follow that part of the discussion). Also, if you want to have seekable
>> streams you probably would have to split cues to remove overlap (nothing
>> different from normal cues).
>>
>> Alternatively, I could also imagine something simpler like:
>> 00:01:00.000 --> 00:01:30.000 style:color:green;
>> Some green text
>>
>> 00:01:20.000 --> 00:02:00.000 style:color:red;
>> Some red text
>>
>> Maybe this could modified to import styles instead of inlining them, I
>> didn't think about that. Also, as I pointed out in my previous email,
>> such VTT file starts to become a multiplex with styles and content. It
>> may be more appropriate to define a Style stream (maybe using the WebVTT
>> syntax) and to link the style stream with the content stream, either
> >from the WebVTT content file or from an additional <track> element.
>>> your only
>>> recourse is to add a new text track at the point where new style is
>>> needed.
>> Without defining timed styles (as above), adding a new text track is an
>> option, but not the only one, you can use one text track and fill it
>> with cues coming from different WebVTT files. In the HLS approach, every
>> WebVTT segment would (re-)define its styles. That does not mean you have
>> to maintain multiple tracks.
>>> This will involve scripts, at which point handling multiple
>>> WebVTT tracks will compare unfavorably with just using a WebSocket
>>> connection to deliver cues and style using a custom syntax.
>> Maybe in some cases the WebSocket approach can be useful, but there are
>> other issues as well like caching.
>>
>> -- 
>> Cyril Concolato
>> Multimedia Group / Telecom ParisTech
>> http://concolato.wp.mines-telecom.fr/
>> @cconcolato
>>
>>


-- 
Cyril Concolato
Multimedia Group / Telecom ParisTech
http://concolato.wp.mines-telecom.fr/
@cconcolato

Received on Thursday, 22 October 2015 14:10:36 UTC