Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Cyril Concolato on 2015-10-22 (public-texttracks@w3.org from October 2015)

From: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Date: Thu, 22 Oct 2015 15:40:20 +0200
To: public-texttracks@w3.org
Message-ID: <5628E744.8080505@telecom-paristech.fr>
Le 22/10/2015 13:36, Philip Jägenstedt a écrit :
> On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato
> <cyril.concolato@telecom-paristech.fr> wrote:
>> Le 21/10/2015 15:39, Philip Jägenstedt a écrit :
>>> In the DASH/MP4/VTT software stack, is WebVTT the input or the output, and
>>> is it a file or a stream? AFAICT, the only issue would be with a WebVTT
>>> input stream (using the syntax in the spec, not any other framing) with
>>> STYLE blocks at the end, but since streaming standalone WebVTT doesn't exist
>>> yet I'm uncertain if that's really what you mean.
>> These are the good questions. It is currently possible to have a
>> never-ending WebVTT file being produced live, delivered over HTTP (e.g.
>> using chunked transfer encoding). Such WebVTT 'stream' cannot easily be
>> consumed by a browser today because the Streams API is not there yet, but it
>> will be available in the future. Other (non-browser) WebVTT implementations
>> can already use that today. This might require careful creation of cues to
>> insure that each point is a random access, but that's possible today.
>> Several services can be done based on that: think of a WebRadio with
>> subtitling. Regarding MP4 packagine, an implementation could consume such
>> stream and produces MP4 segments on the fly, if needed.
>>
>> For those implementations, if a new untimed style header would arrive in the
>> input WebVTT stream and if such style would be defined to have effects on
>> the whole 'file', i.e. including to cues prior in the 'file', then playing
>> the live stream versus recording the stream and then playing a file would
>> not have the same result. That would be problematic. That's why I think that
>> styles should either be in the header (with semantics that they are valid
>> for the whole file and without the ability to be in between cues) or as a
>> timed block with a well defined time validity (like cues), or as settings of
>> a cue. For the last two options, it really looks like WebVTT would become a
>> multiplex of two types of timed data (cue and styles), I'm not sure we
>> should go in this direction and if a separate style file/stream wouldn't be
>> better.
> Do you have a pointer to such a never-ending WebVTT file deployed on
> the public web?
No I don't, but that does not mean that it does not exist nor that we 
should break such scenario.
> I honestly didn't think they would exist yet.
>
> To be pendantic, the reason that never-ending WebVTT files don't work
> in browsers isn't because of the Streams API, but because the media
> element's readyState cannot reach HAVE_FUTURE_DATA until the text
> tracks are ready:
> https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tracks-are-ready
>
> This is what the spec bug is about, some mechanism to unblock
> readyState before text track parsing has finished:
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029
Sorry, I wasn't clear. I know about that bug. I was already assuming 
that a web app would fetch the WebVTT (using XHR or fetch, retrieving 
the text content as a stream), parse it and produce the cues in JS, not 
at all using the native browser support, because of that exact bug.
>
> Anyway, letting the parser discard style blocks after any cues until
> we've figured out the live streaming issues is OK with me. However,
> let's spell out the implications of keeping this restriction for live
> streams:
I agree that it's the right approach. We should be aware of the 
limitations of such approach.
> If you don't know all of the style up front,
I agree that "if you don't know all of the style up front" you have a 
problem to solve. Nigel already pointed that out, as being useful in 
broadcast where you don't necessarily know in advance all your styles. 
To me, there are 2 main approaches: using timed styles or refreshing 
untimed styles.

By timed styles, I imagine something like:

00:01:00.000 --> 00:02:00.000 type:style
.myRedClass {
    color: red;
}
.myGreenClass {
    color: green;
}

00:01:00.000 --> 00:01:30.000
<v.myGreenClass>Some green text

00:01:20.000 --> 00:02:00.000
<v.myRedClass>Some red text

A cue of with a 'type' settings whose value is 'style' carries style 
content not text content. This has the advantage of giving precise 
timing for the styles, and we can force styles to appear in start time 
order (like cues) and before a cue that has a similar start time. There 
are probably problems with the syntax (blank lines in CSS, I did not 
follow that part of the discussion). Also, if you want to have seekable 
streams you probably would have to split cues to remove overlap (nothing 
different from normal cues).

Alternatively, I could also imagine something simpler like:
00:01:00.000 --> 00:01:30.000 style:color:green;
Some green text

00:01:20.000 --> 00:02:00.000 style:color:red;
Some red text

Maybe this could modified to import styles instead of inlining them, I 
didn't think about that. Also, as I pointed out in my previous email, 
such VTT file starts to become a multiplex with styles and content. It 
may be more appropriate to define a Style stream (maybe using the WebVTT 
syntax) and to link the style stream with the content stream, either 
from the WebVTT content file or from an additional <track> element.
> your only
> recourse is to add a new text track at the point where new style is
> needed.
Without defining timed styles (as above), adding a new text track is an 
option, but not the only one, you can use one text track and fill it 
with cues coming from different WebVTT files. In the HLS approach, every 
WebVTT segment would (re-)define its styles. That does not mean you have 
to maintain multiple tracks.
> This will involve scripts, at which point handling multiple
> WebVTT tracks will compare unfavorably with just using a WebSocket
> connection to deliver cues and style using a custom syntax.
Maybe in some cases the WebSocket approach can be useful, but there are 
other issues as well like caching.

-- 
Cyril Concolato
Multimedia Group / Telecom ParisTech
http://concolato.wp.mines-telecom.fr/
@cconcolato
Received on Thursday, 22 October 2015 13:40:51 UTC