Re: TextTrackCue discussions

On Tue, Aug 27, 2013 at 7:36 AM, Brendan Long <self@brendanlong.com> wrote:

>  On 08/23/2013 03:21 PM, Silvia Pfeiffer wrote:
>
> You can't define a conversion to HTML for something for which you don't
> know the format. That's too much magic. For example, if somebody provides
> some JSON text in the text attribute of the TextTrackCue, how is the
> browser to know how to convert this to HTML?
>
> It just wouldn't. getTextAsHTML() would return null.
>


What's the point in adding a function to the TextTrackCue interface that
would always return Null?



> Maybe a better option though, would be to have two different interfaces,
> since not all cues are going to be fully parsed:
>
> // For a cue that the browser doesn't understand
> interface TextTrackCue : EventTarget {
>     attribute DOMString text;
> };
>
> // For a cue that the browser does understand
> interface ParsedTextTrackCue : TextTrackCue {
>     attribute DOMString id <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-id>;
>     attribute double startTime <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-starttime>;
>     attribute double endTime <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-endtime>;
>     attribute boolean pauseOnExit <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-pauseonexit>;
>     attribute DOMString text <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-text>;
>
> DocumentFragment <http://www.w3.org/TR/2012/CR-html5-20121217/infrastructure.html#documentfragment> getCueAsHTML <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#dom-texttrackcue-getcueashtml>();
>
>     attribute EventHandler <http://www.w3.org/TR/2012/CR-html5-20121217/webappapis.html#eventhandler> onenter <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#handler-texttrackcue-onenter>
>
> ;
>     attribute EventHandler <http://www.w3.org/TR/2012/CR-html5-20121217/webappapis.html#eventhandler> onexit <http://www.w3.org/TR/2012/CR-html5-20121217/embedded-content-0.html#handler-texttrackcue-onexit>; ...
> };
>
> // And then if you want this..
> interface WebVTTCue : ParsedTextTrackCue {
>     attribute DOMString regionId <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-regionid>;
>     attribute DirectionSetting <http://dev.w3.org/html5/webvtt/#dfn-directionsetting> vertical <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-vertical>;
>     attribute boolean snapToLines <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-snaptolines>;
>     attribute (long or AutoKeyword <http://dev.w3.org/html5/webvtt/#dfn-autokeyword>) line <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-line>;
>     attribute long position <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-position>;
>     attribute long size <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-size>;
>     attribute AlignSetting <http://dev.w3.org/html5/webvtt/#dfn-alignsetting> align <http://dev.w3.org/html5/webvtt/#dfn-dom-vttcue-align>;
> };
>
> interface TTMLCue : ParsedTextTrackCue {
>     ...
> };
>
> interface CEA708Cue: ParsedTextTrackCue {
>     ...
> };
>
> This way, all of the most important information is guaranteed to be
> present in the same form, and developers don't need to know about the
> various formats unless they want format-specific information.
>


Let's take a step back and look at what happens when a in-band text track
is "exposed" by the browser.

There are three steps involved at which we can expose an interface:

1. unravel the in-band container encapsulation:
-> you end up with a sequence of {starttime, endtime, data} objects (which
we call "cues")

2. parse the content (data) in the cues:
-> you end up with a sequence of {starttime, endtime, data, getCueAsHTML()}
objects

3. render the parsed data in the cues:
-> you end up with a sequence of {starttime, endtime, data, getCueAsHTML(),
extra attributes} objects with rendering rules

The WHATWG spec has a strong position on this: if you decide to expose an
interface to in-band tracks, you implement all of these steps or none.
This is why the TextTrackCue API is completely virtual and the only
instantiation is the VTTCue API of the WebVTT spec. Thus, only WebVTT cues
can be de-capsulated, parsed and potentially rendered when following the
current WHATWG spec.

The W3C spec's position is that browsers will regard TextTrackCues as
generic containers for delivering timed data for media resources. Browsers
may, however, not implement parsing and rendering of such cues (because
there are far too many timed data formats out there to take on that
burden). They will, however, be happy to exposed the data to the JS
developer and leave the parsing and rendering to the JS developer. This is
why the TextTrackCue API has a .text attribute and a constructor.

Your proposal takes this one step further and you suggest to have a generic
interface for parsed cues. Such a generic interface would abstract away
knowledge of the actual format of the cue that the browser parsed and just
expose a HTML representation of the cues to the JS developer. Am I
correctly interpreting your intention?

What would be the reason for hiding the actual format that the browser
parsed from the JS developer? Why wouldn't we specify another interface for
such a parsed format that is actually able to give the JS developer more
specific information/attributes about the Cue format just like we did with
VTTCue? What cue formats do you see being supported by browsers as generic
parsed cues without their own special interface?

Cheers,
Silvia.


PS: There has been recent discussion of the TextTrackCue API on the blink
list [1] about the fork between the WHATWG and W3C specs and I will start
another thread to address it.

[1]
https://groups.google.com/a/chromium.org/d/msg/blink-dev/-VHGnuNNUxM/Yibbv2TgDoYJ

Received on Saturday, 31 August 2013 01:37:22 UTC