Re: TextTrackCue discussions from Glenn Adams on 2013-09-09 (public-html@w3.org from September 2013)

From: Glenn Adams <glenn@skynav.com>
Date: Mon, 9 Sep 2013 00:26:25 -0600
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Brendan Long <self@brendanlong.com>, "HTML WG (public-html@w3.org)" <public-html@w3.org>
Message-ID: <CACQ=j+fwz98gELBDKqgzu505BT1cnr3NnX2pNMW7kQodUdhvaA@mail.gmail.com>
On Sun, Sep 8, 2013 at 6:41 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> On Mon, Sep 9, 2013 at 4:30 AM, Brendan Long <self@brendanlong.com> wrote:
> >
> > But for image formats the browser does understand, you would expect it to
> > present a consistent interface for the parts that are consistent (all
> images
> > should have height, width and pixel data).
>
> Right.
>
> >>> I find it hard to believe that there's any cue format
> >>> that can't be represented in HTML, and I think to make things
> reasonable for
> >>> JS developers, we should force that format to always be available
> (although,
> >>> for efficiency, an implementation could choose to create it lazily of
> >>> course).
> >>
> >> Binary cue formats don't have a HTML representation, e.g. DVD subtitles.
> >
> > <img src="data:image/bmp;base64,......" /> would work. It's not as nice
> as
> > getting text, but it's about as good as you can do for image-base
> subtitles.
>
> That would require converting DVD raster graphics to an img format
> that the browser understands. Once a browser implements that, there
> are likely other attributes that it will implement, so a specific
> DVDCue interface would be created. The abstract TextTrackCue interface
> supports creating this and I think we have already agreed that
> reverting that to what the WHATWG spec has is the way to go:
>
> interface TextTrackCue : EventTarget {
>   readonly attribute TextTrack? track;
>
>            attribute DOMString id;
>            attribute double startTime;
>            attribute double endTime;
>            attribute boolean pauseOnExit;
>
>            attribute EventHandler onenter;
>            attribute EventHandler onexit;
> };
>
>
> The question now is: should there be a thin GenericCue interface or a
> thick TextCue interface as the parent for text-based cues.
>
> Here's a brain storm on how it could be:
>
>
> [Constructor(double startTime, double endTime, DOMString text)]
> interface GenericCue : TextTrackCue {
>            attribute DOMString text;
>   DocumentFragment getCueAsHTML();
> };
>

OK, but leave out getCueAsHTML(), since a generic cue has no (known)
rendering semantics.


> OR
>
> enum AutoKeyword { "auto" };
> enum DirectionSetting { "" /* horizontal */, "rl", "lr" };
> enum AlignSetting { "start", "middle", "end", "left", "right" };
> [Constructor(double startTime, double endTime, DOMString text)]
> interface TextCue : TextTrackCue {
>     attribute DirectionSetting vertical;
>     attribute boolean snapToLines;
>     attribute (long or AutoKeyword) line;
>     attribute long position;
>     attribute long size;
>     attribute AlignSetting align;
>     attribute DOMString text;
>   DocumentFragment getCueAsHTML();
> };
>

I'm not sure what the point is in pretending that this is a generic
renderable cue when it is based on VTT style semantics. Each style
attribute here is defined in terms of VTT semantics. Let's call a duck a
duck.


> I'm now thinking .text could be restricted to just plain text. Thus,
> I've moved getCueAsHTML() into these interfaces, too, seeing as it
> could simply return a DocumentFragment with plain text.


It could, but what's the point when the text attribute suffices?


> There could be
> rendering for the appropriate @kind values, which would only be
> influenced by the attributes.


Again, this would depend on the underlying track format and cue format, not
just @kind. So one mapping doesn't serve all formats.


> With GenericCue, it would just be
> rendered bottom center by default, with TextCue, we'd also get
> positioning and directionality. This would neatly cover simple SRT,
> and restricted versions of other caption formats, too.
>

None of these style attributes accommodate TTML style semantics. So it
would be of no utility there.


>
> Then we get to move advanced markup and rendering functionality into
> more specific interfaces.
>
> For example, VTTCue will be adapted to one of these:
>
> enum AutoKeyword { "auto" };
> enum DirectionSetting { "" /* horizontal */, "rl", "lr" };
> enum AlignSetting { "start", "middle", "end", "left", "right" };
> [Constructor(double startTime, double endTime, DOMString text)]
> interface VTTCue : GenericCue {
>     attribute DOMString regionId;
>     attribute DirectionSetting vertical;
>     attribute boolean snapToLines;
>     attribute (long or AutoKeyword) line;
>     attribute long position;
>     attribute long size;
>     attribute AlignSetting align;
> };
>

The only thing you've done here is add regionId and removed getCueAsHTML(),
neither of which makes any sense.


>
> OR
>
> [Constructor(double startTime, double endTime, DOMString text)]
> interface VTTCue : TextCue {
>            attribute DOMString regionId;
> };
>
> with .text containing VTT specifics and getCueAsHTML() converting VTT
> cue markup to HTML.
>

If you are going to define a generic text cue, then all it needs is a text
attribute. It doesn't need VTT specific style attributes, and it doesn't
need getCueAsHTML().

You may wish to take a look at a new document I've drafted as a first
attempt to define an UnparsedTTMLCue interface at [1]. Take note also of
"Issue 1" therein.

[1] http://dvcs.w3.org/hg/ttml/raw-file/default/ttml1-api/Overview.html

I will publish a draft parsed/renderable TTML cue interface (TTMLCue) this
week, which I expect will roughly look like:

[Constructor(double startTime, double endTime, XMLDocument? content)]
interface TTMLCue : UnparsedTTMLCue {
  XMLDocument getCueAsTTML(); // return live document
  DocumentFragment getCueAsHTML(); // return non-live, renderable HTML
fragment
}


>
> Thoughts?
>
> Cheers,
> Silvia.
>
>
Received on Monday, 9 September 2013 06:27:15 UTC