Re: [blink-dev] WebVTT vs TTML Features from Glenn Adams on 2013-12-10 (public-texttracks@w3.org from December 2013)

From: Glenn Adams <glenn@chromium.org>
Date: Wed, 11 Dec 2013 07:37:31 +0800
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>, John Luther <jluther@google.com>, Victor Cărbune <vcarbune@chromium.org>, David Singer <singer@apple.com>, Nigel Megitt <nigel.megitt@bbc.co.uk>, Silvia Pfeiffer <silviapf@chromium.org>
Message-ID: <CAB=O+cq_qS90tQkiFdxXMqoQeC9CdJKNX-J1o1MEBXN=oMYp=Q@mail.gmail.com>
On Wed, Dec 11, 2013 at 7:32 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> On Wed, Dec 11, 2013 at 10:26 AM, Glenn Adams <glenn@chromium.org> wrote:
> >
> >
> >
> > On Wed, Dec 11, 2013 at 6:59 AM, Silvia Pfeiffer <
> silviapfeiffer1@gmail.com>
> > wrote:
> >>
> >> Some corrections inline since there seem to be some misunderstandings.
> >>
> >>
> >> On Wed, Dec 11, 2013 at 8:20 AM, Glenn Adams <glenn@chromium.org>
> wrote:
> >> >
> >> > On Wed, Dec 11, 2013 at 5:08 AM, Silvia Pfeiffer
> >> > <silviapfeiffer1@gmail.com>
> >> > wrote:
> >> >>
> >> >>
> >> >> On 11 Dec 2013 07:56, "Glenn Adams" <glenn@chromium.org> wrote:
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 11, 2013 at 3:34 AM, David Singer <singer@apple.com>
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >> On Dec 9, 2013, at 11:36 , Glenn Adams <glenn@chromium.org>
> wrote:
> >> >> >>
> >> >> >> > But not as well as you could it would seem: on Chrome, WebVTT is
> >> >> >> > simply translated to cues referring to a CSS styled HTML
> fragment.
> >> >> >> > Why not
> >> >> >> > simple define an HTMLCue, and dispense entirely with VTTCue and
> >> >> >> > the WebVTT
> >> >> >> > parser. The WebVTT could be translated to a sequence of HTML
> cues
> >> >> >> > on the
> >> >> >> > server or using client JS.
> >> >> >> >
> >> >> >>
> >> >> >> This is probably stating the obvious, but you asked.
> >> >> >>
> >> >> >> for at least two reasons:
> >> >> >>
> >> >> >> * we want this to be only one of many possible implementation
> choice
> >> >> >> and
> >> >> >> * we want there to be a simple expression of the timed cues that
> is
> >> >> >> not
> >> >> >> dependent on an implementation choice
> >> >> >
> >> >> >
> >> >> > Which would require the "simple expression" to be a
> >> >> > semantic/stylistic
> >> >> > superset of formats, which HTML/CSS is, but WebVTT isn't.
> >> >>
> >> >> Allowing all of html and css in cues is madness.
> >> >
> >> >
> >> > I don't recall ever saying to allow "all" of html/css. The fact of the
> >> > matter is that VTT implementations translate VTT cues to some subset
> of
> >> > HTML/CSS. We are also defining a mapping from TTML to some subset of
> >> > HTML/CSS.
> >> >
> >> > This process begs the question of whether any translation from an
> input
> >> > format like TTML or VTT into HTML/CSS should be implemented in the
> >> > browser
> >> > rather than in, say, JS client code.
> >>
> >> Yes, that's what we're doing with VTT when VTT is used in the browser
> >> - we map it to HTML and CSS for rendering. Non-browsers can decided to
> >> use a different approach for rendering.
> >>
> >>
> >> > Going one step further, it is natural
> >> > to ask if it makes sense to have servers deliver cues using HTML/CSS
> >> > directly, thus even avoiding the need for JS client translation.
> >>
> >> That makes browsers have to support all of HTML/CSS in cues, which, as
> >> I said above, makes no sense.
> >
> >
> > Repeat again: I did not say "all of HTML/CSS", and what I did suggest
> does
> > not imply "all". If one were to define direct delivery of HTML/CSS based
> > cues, there is no reason it could not be a subset of HTML/CSS.
>
>
> Do you want to define a HTMLCue that supports all of HTML or a
> PartialHTMLCue? Since cues can be created by any authoring system,
> including by JavaScript in the browser, you can't just say that
> HTMLCue supports all of HTML/CSS, but you're only ever sending it a
> subset of HTML/CSS. That's not how it works.
>

It is if that's how you define it. Right now you have a method
getCueAsHTML() not getCueAsPartialHTML(). Whether it is everything or
partial is rather besides the point of this discussion anyway. My point is
there might be utility in defining some Cue type (I don't care what to name
it) that allows JS client code to populate a track's cues with HTML/CSS (at
whatever subset we want to enforce). Going beyond that, there may be
utility in defining a way to deliver such cues directly (without requiring
client JS).


>
>
> >> >> Why did ttml not do that either?
> >> >
> >> > The current cue system defined in HTML5 is a new concept and
> mechanism.
> >> > That
> >> > it is defined in terms of getCueAsHTML() for rendering purposes begs
> the
> >> > question of whether to use HTML in the first place.
> >> >
> >> > It has recently been suggested (very strongly indeed) that clients
> need
> >> > not
> >> > directly support TTML rendering since JS client code could perform
> >> > translation into HTML/CSS fragments.  That is not an unreasonable
> >> > suggestion, but it is inconsistent with saying that a client should
> >> > directly
> >> > support VTT to HTML/CSS translation, while saying a client shouldn't
> do
> >> > this
> >> > for TTML.
> >>
> >> This is not the right place to discuss decisions that browsers made
> >> about which formats they want to implement.
> >
> >
> > The fact that anyone has suggested that a browser need not implement some
> > format, like TTML, because it could either be translated in JS client
> code
> > or could be delivered as (potentially subsetted) HTTML fragments,
> naturally
> > begs the question of whether the same logic should apply to VTT. That is
> > relevant to this thread.
> >
> >>
> >>
> >>
> >> > My purpose in suggesting the potential utility of defining an HTMLCue
> as
> >> > such is to demonstrate that one *could* dispense with any direct
> client
> >> > support for VTT or TTML other than fetching or demultiplexing VTT/TTML
> >> > content and passing it to client JS code to be translated into HTMLCue
> >> > instances.
> >>
> >> DataCue does that already. You can always expose normal HTML in a
> >> DataCue and then simply render .text in a DocumentFragment.
> >
> >
> > Except that DataCue is explicitly defined as non-renderable metadata and
> > does not define a getCueAsHTML() member.
>
> It doesn't need it. The content is what the Web page makes of it. If
> the Web page is told that it's HTML, then you don't need
> getCueAsHTML() because .text already is HTML.
>
>
> >> >> Authors of captions need something that works for the use case, ie.
> >> >> captioning, and not for publishing. If you want all of html+CSS, you
> >> >> don't
> >> >> need a new format - you just write a web page.
> >> >
> >> >
> >> > I never said "all of HTML/CSS". Note that at present,
> >> > VTTCue.getCueAsHTML()
> >> > doesn't explicitly limit what HTML is contained in the returned
> >> > fragment.
> >>
> >> Yes, VTT limits what HTML is contained in getCueAsHTML(). For example,
> >> there will never be a <table> element in a DocumentFragment returned
> >> by VTTCue.getCueAsHTML().
> >
> >
> > If VTT limits it, then it is only implicit and not part of the contract
> for
> > getCueAsHTML(). It is implicit only because the mapping defined by VTT
> > chooses a particular subset. But what is to prevent that subset changing
> > over time, as VTT grows? So really it doesn't limit it substantially.
>
> It's exactly the same kind of limitation that you are asking for HTMLCue.
>
>
> >> > If we wanted, we could finish the process of formally defining the VTT
> >> > to
> >> > HTML/CSS mapping, do the same for TTML, then constrain the fragments
> >> > returned from getCueAsHTML() to the subset of HTML/CSS that is
> >> > sufficient to
> >> > render these formats.
> >>
> >> The point of VTT is that it also allows VTT cues to be authored in
> >> JavaScript and added to the list of cues. The content of such cues has
> >> to be limited to what VTT cues support. If you want all of HTML/CSS,
> >> you have to use DataCue.
> >
> >
> > Except for the non-renderability of DataCue as defined; i.e., I can't
> create
> > a TextTrack populated by DataCue instances and expect that track to be
> > rendered by the UA in the way that I might expect a TextTrack populated
> by
> > VTTCue instances to be rendered.
>
> Right, you have to use JavaScript to render it. But it would probably
> be a fairly simple rendering if you assume the video viewport as your
> rendering target. Try implementing it in JavaScript first, see what
> the limitations are. Then there's always still time to define a
> HTMLCue.
>
> Regards,
> Silvia.
>
Received on Tuesday, 10 December 2013 23:37:59 UTC