Re: [blink-dev] WebVTT vs TTML Features from Glenn Adams on 2013-12-10 (public-texttracks@w3.org from December 2013)

From: Glenn Adams <glenn@chromium.org>
Date: Wed, 11 Dec 2013 07:26:36 +0800
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>, John Luther <jluther@google.com>, Victor Cărbune <vcarbune@chromium.org>, David Singer <singer@apple.com>, Nigel Megitt <nigel.megitt@bbc.co.uk>, Silvia Pfeiffer <silviapf@chromium.org>
Message-ID: <CAB=O+cqr3P3TYrC1Mt0gxYNacd0=ijoBp2B85BQ3KasZTAZ0HA@mail.gmail.com>
On Wed, Dec 11, 2013 at 6:59 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> Some corrections inline since there seem to be some misunderstandings.
>
>
> On Wed, Dec 11, 2013 at 8:20 AM, Glenn Adams <glenn@chromium.org> wrote:
> >
> > On Wed, Dec 11, 2013 at 5:08 AM, Silvia Pfeiffer <
> silviapfeiffer1@gmail.com>
> > wrote:
> >>
> >>
> >> On 11 Dec 2013 07:56, "Glenn Adams" <glenn@chromium.org> wrote:
> >> >
> >> >
> >> > On Wed, Dec 11, 2013 at 3:34 AM, David Singer <singer@apple.com>
> wrote:
> >> >>
> >> >>
> >> >> On Dec 9, 2013, at 11:36 , Glenn Adams <glenn@chromium.org> wrote:
> >> >>
> >> >> > But not as well as you could it would seem: on Chrome, WebVTT is
> >> >> > simply translated to cues referring to a CSS styled HTML fragment.
> Why not
> >> >> > simple define an HTMLCue, and dispense entirely with VTTCue and
> the WebVTT
> >> >> > parser. The WebVTT could be translated to a sequence of HTML cues
> on the
> >> >> > server or using client JS.
> >> >> >
> >> >>
> >> >> This is probably stating the obvious, but you asked.
> >> >>
> >> >> for at least two reasons:
> >> >>
> >> >> * we want this to be only one of many possible implementation choice
> >> >> and
> >> >> * we want there to be a simple expression of the timed cues that is
> not
> >> >> dependent on an implementation choice
> >> >
> >> >
> >> > Which would require the "simple expression" to be a semantic/stylistic
> >> > superset of formats, which HTML/CSS is, but WebVTT isn't.
> >>
> >> Allowing all of html and css in cues is madness.
> >
> >
> > I don't recall ever saying to allow "all" of html/css. The fact of the
> > matter is that VTT implementations translate VTT cues to some subset of
> > HTML/CSS. We are also defining a mapping from TTML to some subset of
> > HTML/CSS.
> >
> > This process begs the question of whether any translation from an input
> > format like TTML or VTT into HTML/CSS should be implemented in the
> browser
> > rather than in, say, JS client code.
>
> Yes, that's what we're doing with VTT when VTT is used in the browser
> - we map it to HTML and CSS for rendering. Non-browsers can decided to
> use a different approach for rendering.
>
>
> > Going one step further, it is natural
> > to ask if it makes sense to have servers deliver cues using HTML/CSS
> > directly, thus even avoiding the need for JS client translation.
>
> That makes browsers have to support all of HTML/CSS in cues, which, as
> I said above, makes no sense.
>

Repeat again: I did not say "all of HTML/CSS", and what I did suggest does
not imply "all". If one were to define direct delivery of HTML/CSS based
cues, there is no reason it could not be a subset of HTML/CSS.


>
>
> >> Why did ttml not do that either?
> >
> > The current cue system defined in HTML5 is a new concept and mechanism.
> That
> > it is defined in terms of getCueAsHTML() for rendering purposes begs the
> > question of whether to use HTML in the first place.
> >
> > It has recently been suggested (very strongly indeed) that clients need
> not
> > directly support TTML rendering since JS client code could perform
> > translation into HTML/CSS fragments.  That is not an unreasonable
> > suggestion, but it is inconsistent with saying that a client should
> directly
> > support VTT to HTML/CSS translation, while saying a client shouldn't do
> this
> > for TTML.
>
> This is not the right place to discuss decisions that browsers made
> about which formats they want to implement.
>

The fact that anyone has suggested that a browser need not implement some
format, like TTML, because it could either be translated in JS client code
or could be delivered as (potentially subsetted) HTTML fragments, naturally
begs the question of whether the same logic should apply to VTT. That is
relevant to this thread.


>
>
> > My purpose in suggesting the potential utility of defining an HTMLCue as
> > such is to demonstrate that one *could* dispense with any direct client
> > support for VTT or TTML other than fetching or demultiplexing VTT/TTML
> > content and passing it to client JS code to be translated into HTMLCue
> > instances.
>
> DataCue does that already. You can always expose normal HTML in a
> DataCue and then simply render .text in a DocumentFragment.
>

Except that DataCue is explicitly defined as non-renderable metadata and
does not define a getCueAsHTML() member.


>
>
> >> Authors of captions need something that works for the use case, ie.
> >> captioning, and not for publishing. If you want all of html+CSS, you
> don't
> >> need a new format - you just write a web page.
> >
> >
> > I never said "all of HTML/CSS". Note that at present,
> VTTCue.getCueAsHTML()
> > doesn't explicitly limit what HTML is contained in the returned fragment.
>
> Yes, VTT limits what HTML is contained in getCueAsHTML(). For example,
> there will never be a <table> element in a DocumentFragment returned
> by VTTCue.getCueAsHTML().
>

If VTT limits it, then it is only implicit and not part of the contract for
getCueAsHTML(). It is implicit only because the mapping defined by VTT
chooses a particular subset. But what is to prevent that subset changing
over time, as VTT grows? So really it doesn't limit it substantially.


>
>
> > If we wanted, we could finish the process of formally defining the VTT to
> > HTML/CSS mapping, do the same for TTML, then constrain the fragments
> > returned from getCueAsHTML() to the subset of HTML/CSS that is
> sufficient to
> > render these formats.
>
> The point of VTT is that it also allows VTT cues to be authored in
> JavaScript and added to the list of cues. The content of such cues has
> to be limited to what VTT cues support. If you want all of HTML/CSS,
> you have to use DataCue.
>

Except for the non-renderability of DataCue as defined; i.e., I can't
create a TextTrack populated by DataCue instances and expect that track to
be rendered by the UA in the way that I might expect a TextTrack populated
by VTTCue instances to be rendered.


>
> Regards,
> Silvia.
>
Received on Tuesday, 10 December 2013 23:27:04 UTC