Re: Inband styling (was Re: Evidence of 'Wide Review' needed for VTT) from Cyril Concolato on 2015-10-23 (public-texttracks@w3.org from October 2015)

From: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Date: Fri, 23 Oct 2015 10:50:37 +0200
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: public-texttracks@w3.org
Message-ID: <5629F4DD.8080405@telecom-paristech.fr>
Le 22/10/2015 23:33, Silvia Pfeiffer a écrit :
>
> Just so we are clear: this already exists and works well for WebVTT.
>
I don't know what you mean by 'exists'. There is a pull request on-going 
but it is not (to my knowledge) implemented in any browser (is it?) or 
authoring tool, which means it can be removed if never implemented.
>
> It has limitations though such as having to repeat styles across the 
> segments. Which is why we are discussing alternative approaches. HTH.
>
And an alternative approach might in the future replace or complement 
the current approach if the group feels like the limitations are too 
strong, no?

Cyril
>
> Cheers,
> Silvia.
>
> Best Regards,
> Silvia.
>
> On 23 Oct 2015 12:52 am, "Nigel Megitt" <nigel.megitt@bbc.co.uk 
> <mailto:nigel.megitt@bbc.co.uk>> wrote:
>
>     It would also be possible to take the same approach with VTT as we
>     have
>     taken with TTML, which is that you have a sequence of independent
>     documents each of which contains the styling etc needed to display
>     itself,
>     for whatever time period applies. Then you have something
>     deliverable that
>     will work, and you can separate out the problem of creating a
>     single long
>     document that contains "all the previous documents' content" into a
>     different processing task. If you go down the route of timed
>     styles then
>     you're almost at that point anyway.
>
>     Nigel
>
>
>     On 22/10/2015 14:40, "Cyril Concolato"
>     <cyril.concolato@telecom-paristech.fr
>     <mailto:cyril.concolato@telecom-paristech.fr>> wrote:
>
>     >Le 22/10/2015 13:36, Philip Jägenstedt a écrit :
>     >> On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato
>     >> <cyril.concolato@telecom-paristech.fr
>     <mailto:cyril.concolato@telecom-paristech.fr>> wrote:
>     >>> Le 21/10/2015 15:39, Philip Jägenstedt a écrit :
>     >>>> In the DASH/MP4/VTT software stack, is WebVTT the input or the
>     >>>>output, and
>     >>>> is it a file or a stream? AFAICT, the only issue would be with a
>     >>>>WebVTT
>     >>>> input stream (using the syntax in the spec, not any other
>     framing)
>     >>>>with
>     >>>> STYLE blocks at the end, but since streaming standalone WebVTT
>     >>>>doesn't exist
>     >>>> yet I'm uncertain if that's really what you mean.
>     >>> These are the good questions. It is currently possible to have a
>     >>> never-ending WebVTT file being produced live, delivered over
>     HTTP (e.g.
>     >>> using chunked transfer encoding). Such WebVTT 'stream' cannot
>     easily be
>     >>> consumed by a browser today because the Streams API is not
>     there yet,
>     >>>but it
>     >>> will be available in the future. Other (non-browser) WebVTT
>     >>>implementations
>     >>> can already use that today. This might require careful creation of
>     >>>cues to
>     >>> insure that each point is a random access, but that's possible
>     today.
>     >>> Several services can be done based on that: think of a
>     WebRadio with
>     >>> subtitling. Regarding MP4 packagine, an implementation could
>     consume
>     >>>such
>     >>> stream and produces MP4 segments on the fly, if needed.
>     >>>
>     >>> For those implementations, if a new untimed style header would
>     arrive
>     >>>in the
>     >>> input WebVTT stream and if such style would be defined to have
>     effects
>     >>>on
>     >>> the whole 'file', i.e. including to cues prior in the 'file', then
>     >>>playing
>     >>> the live stream versus recording the stream and then playing a
>     file
>     >>>would
>     >>> not have the same result. That would be problematic. That's why I
>     >>>think that
>     >>> styles should either be in the header (with semantics that
>     they are
>     >>>valid
>     >>> for the whole file and without the ability to be in between
>     cues) or
>     >>>as a
>     >>> timed block with a well defined time validity (like cues), or as
>     >>>settings of
>     >>> a cue. For the last two options, it really looks like WebVTT would
>     >>>become a
>     >>> multiplex of two types of timed data (cue and styles), I'm not
>     sure we
>     >>> should go in this direction and if a separate style file/stream
>     >>>wouldn't be
>     >>> better.
>     >> Do you have a pointer to such a never-ending WebVTT file
>     deployed on
>     >> the public web?
>     >No I don't, but that does not mean that it does not exist nor that we
>     >should break such scenario.
>     >> I honestly didn't think they would exist yet.
>     >>
>     >> To be pendantic, the reason that never-ending WebVTT files
>     don't work
>     >> in browsers isn't because of the Streams API, but because the media
>     >> element's readyState cannot reach HAVE_FUTURE_DATA until the text
>     >> tracks are ready:
>     >>
>     >>https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tra
>     >>cks-are-ready
>     >>
>     >> This is what the spec bug is about, some mechanism to unblock
>     >> readyState before text track parsing has finished:
>     >> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029
>     >Sorry, I wasn't clear. I know about that bug. I was already assuming
>     >that a web app would fetch the WebVTT (using XHR or fetch, retrieving
>     >the text content as a stream), parse it and produce the cues in
>     JS, not
>     >at all using the native browser support, because of that exact bug.
>     >>
>     >> Anyway, letting the parser discard style blocks after any cues
>     until
>     >> we've figured out the live streaming issues is OK with me. However,
>     >> let's spell out the implications of keeping this restriction
>     for live
>     >> streams:
>     >I agree that it's the right approach. We should be aware of the
>     >limitations of such approach.
>     >> If you don't know all of the style up front,
>     >I agree that "if you don't know all of the style up front" you have a
>     >problem to solve. Nigel already pointed that out, as being useful in
>     >broadcast where you don't necessarily know in advance all your
>     styles.
>     >To me, there are 2 main approaches: using timed styles or refreshing
>     >untimed styles.
>     >
>     >By timed styles, I imagine something like:
>     >
>     >00:01:00.000 --> 00:02:00.000 type:style
>     >.myRedClass {
>     >    color: red;
>     >}
>     >.myGreenClass {
>     >    color: green;
>     >}
>     >
>     >00:01:00.000 --> 00:01:30.000
>     ><v.myGreenClass>Some green text
>     >
>     >00:01:20.000 --> 00:02:00.000
>     ><v.myRedClass>Some red text
>     >
>     >A cue of with a 'type' settings whose value is 'style' carries style
>     >content not text content. This has the advantage of giving precise
>     >timing for the styles, and we can force styles to appear in start
>     time
>     >order (like cues) and before a cue that has a similar start time.
>     There
>     >are probably problems with the syntax (blank lines in CSS, I did not
>     >follow that part of the discussion). Also, if you want to have
>     seekable
>     >streams you probably would have to split cues to remove overlap
>     (nothing
>     >different from normal cues).
>     >
>     >Alternatively, I could also imagine something simpler like:
>     >00:01:00.000 --> 00:01:30.000 style:color:green;
>     >Some green text
>     >
>     >00:01:20.000 --> 00:02:00.000 style:color:red;
>     >Some red text
>     >
>     >Maybe this could modified to import styles instead of inlining
>     them, I
>     >didn't think about that. Also, as I pointed out in my previous email,
>     >such VTT file starts to become a multiplex with styles and
>     content. It
>     >may be more appropriate to define a Style stream (maybe using the
>     WebVTT
>     >syntax) and to link the style stream with the content stream, either
>     >from the WebVTT content file or from an additional <track> element.
>     >> your only
>     >> recourse is to add a new text track at the point where new style is
>     >> needed.
>     >Without defining timed styles (as above), adding a new text track
>     is an
>     >option, but not the only one, you can use one text track and fill it
>     >with cues coming from different WebVTT files. In the HLS
>     approach, every
>     >WebVTT segment would (re-)define its styles. That does not mean
>     you have
>     >to maintain multiple tracks.
>     >> This will involve scripts, at which point handling multiple
>     >> WebVTT tracks will compare unfavorably with just using a WebSocket
>     >> connection to deliver cues and style using a custom syntax.
>     >Maybe in some cases the WebSocket approach can be useful, but
>     there are
>     >other issues as well like caching.
>     >
>     >--
>     >Cyril Concolato
>     >Multimedia Group / Telecom ParisTech
>     >http://concolato.wp.mines-telecom.fr/
>     >@cconcolato
>     >
>     >
>


-- 
Cyril Concolato
Multimedia Group / Telecom ParisTech
http://concolato.wp.mines-telecom.fr/
@cconcolato
Received on Friday, 23 October 2015 08:51:11 UTC