Re: [whatwg] W3C Timed Text Working Group proposal for HTMLCue from Philip Jägenstedt on 2015-09-11 (public-whatwg-archive@w3.org from September 2015)

From: Philip Jägenstedt <philipj@opera.com>
Date: Fri, 11 Sep 2015 10:13:18 +0200
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: "silviapfeiffer1@gmail.com" <silviapfeiffer1@gmail.com>, "whatwg@whatwg.org" <whatwg@whatwg.org>, "eric.carlson@apple.com" <eric.carlson@apple.com>, "jer.noble@apple.com" <jer.noble@apple.com>
Message-ID: <CAMQvoCnF1gktaRGxQB1hD8HjGLr2=jdyDdqUpgcpz4se3LGUEQ@mail.gmail.com>

Hi Nigel,

I'm trying to spend less time on WebVTT and related questions these
days, but here's my high-level feedback, inline:

On Thu, Sep 10, 2015 at 11:46 PM, Nigel Megitt <nigel.megitt@bbc.co.uk> wrote:
> Dear WhatWG, Jer, Eric, Silvia, Philip,
>
> W3C Timed Text Working Group kindly requests you to consider the following
> proposal:
>
> ---------------------
> What?
> ---------------------
> We propose to specify an implementation of the text track cue API where
> the format of the cue data is HTML. The "cue type" is called in the
> following "HTMLCue".
>
> The default onenter() handler in the HTMLCue interface would attach the
> cue data to the target element; conversely the default onexit() handler
> would clear the target element's inner HTML.
>
> ---------------------
> Why?
> ---------------------
> Different file formats are used for the distribution of subtitles and
> captions in the HTML ecosystem. Currently only WebVTT has a defined Cue
> concept that is implemented by Web Browsers. It would extend the reach
> of accessible content greatly if the text track API could be used by any
> subtitle format.
>
> Options for a solution:
>
> 1) Mapping of other formats to VTTCues
> Although this may be a short-term option a lossless mapping is often not
> feasible and requires considerable knowledge of the source and the
> destination format. It would also need continuous alignment of each of
> the subtitle formats with WebVTT and vice versa.

I agree, extending WebVTT until it is a superset of all other formats
will not end well. Reaching interoperability on WebVTT in its current
states looks like a formidable challenge already.

> 2) Define a cue type for every subtitle format
> Even if these different cue type specifications would exist it is
> unlikely that browsers will support all different cue "specializations".

Yeah, this is also unlikely to yield positive results, and it support
for new formats would be blocked on all browsers implementing a new
FooCue every time.

> 3) Define a generic cue type
> This cue type should be easy to implement by browsers and it should be
> possible to map the semantic to the scope of different existing formats.
>
> We think an HTML cue as a variant of the third option is the best
> solution.
>
> The strength of a  generic  HTML cue type is that, assuming that the way
> to render these cues is clearly defined somewhere, basically any kind of
> subtitle format that can be translated into HTML could be supported, as
> long as the browser, a client side JS   or a server based solution does
> the translation work somewhere. One way to make use of the doc fragment
> is to place the doc fragment as an overlay over the video.
>
> The HTMLCue could be defined as an HTML extension. The HTML5 Spec itself
> does not need to be changed.
>
> It may be worth noting that under the hood some browsers already
> translate WebVTT to HTML and some client side JS solutions translate
> TTML to HTML.

If you mean VTTCue.getCueAsHTML(), this code path is actually not used
for rendering, it's just a JavaScript API. WebVTT cannot be mapped to
plain HTML as it is because it requires special rendering code,
specifically to handle the line stepping and overlap avoidance. (It is
of course possible to use HTML as part of a WebVTT implementation, but
not only plain HTML and CSS.)

That being said, I think there would be great value in working on
primitives that are more low-level than WebVTT on which both WebVTT
and (say) TTML could be implemented. A cue type that simply appends an
arbitrary DocumentFragment to a container isn't sufficient though,
because you can't implement WebVTT like that.

>From the history of this discussion I know that "just use metadata
VTTCues with script-based rendering" isn't the answer you are looking
for, but it would be helpful to make the reasons explicit. I suspect
the important ones are:
 • Wanting to support unmodified in-band (and out-of-band?) TTML tracks.
 • Not wanting to depend on JavaScript.

Are these accurate, and is there anything else? FWIW, I think both of
these points are problematic, because the first necessarily implies
some TTML support in browsers, and the second is at odds with how new
web platform features are being designed today.

Philip

Received on Friday, 11 September 2015 08:13:51 UTC