RE: TextTrackCue discussions

I understand that the existing attributes map to WebVTT semantics, and to me that isn’t desirable.  I’m not sure of the history that added those.

Are you saying that a WebVTT styled fragment cannot be entered as you’ve described for a proposed TTMLCue solution?  If either was possible, I believe the base class cue object could be made compatible.  That leaves open how inband captions get styled.  I would seem they need either style attributes (which we’ve acknowledged are format specific) or they would need to be translated into a styled cue fragment using either WebVTT or TTML.  Is providing format specific attributes considered a big advantage over translating plain text into styled fragments?


From: Glenn Adams []
Sent: Monday, September 16, 2013 12:00 PM
To: Jerry Smith (WINDOWS)
Cc: Brendan Long; Silvia Pfeiffer; HTML WG (
Subject: Re: TextTrackCue discussions

On Mon, Sep 16, 2013 at 12:36 PM, Jerry Smith (WINDOWS) <<>> wrote:
File based caption scenarios work fine with a format agnostic cue object, which suggests your concern about loss of essential semantics involves dynamically created cues.

At present, all of the content related attributes/operations on TextTrackCue/VTTCue are format specific in their semantics and possibly syntax:

text - both syntax and semantics differ, given returned text is format specific;

getCueAsHTML() - returned DocumentFragment is expected to have different structure and use different mapping for different formats; at present, no constraints are specified on the returned fragment, e.g., whether it is acceptable for parenting to HTMLDivElement, etc;

immediate style attributes - those that have moved from TextTrackCue to VTTCue are all WebVTT format specific;

Consequently, I don't see any difference between file based (out-of-band) and in-band scenarios as far as whether a cue interface is format specific or not. The old TextTrackCue is format specific (WebVTT), and the new VTTCue is format specific (WebVTT). However, the new TextTrackCue is format neutral.
A lot could be done for these by supporting formatted XML or some other format fragment on the text cue.  This would require format specific support in the browsers, like media files do with media elements now.  That’s true with format specific cues as well, like the proposed VTTCue.  Do you believe doing this would lose support for essential format specific semantics?
I expect that TTMLCue will expose a reference to an XMLDocument representing the parsed TTML XML DOM. While there are some renderable formats (e.g., smilText) and metadata formats (e.g., MPEG7) that also use XML, and that could be exposed in a similar fashion, most caption/subtitle/metadata formats use either format specific plain text formats or binary formats.


From: Glenn Adams [<>]
Sent: Thursday, September 12, 2013 7:17 PM
To: Jerry Smith (WINDOWS)
Cc: Brendan Long; Silvia Pfeiffer; HTML WG (<>)
Subject: Re: TextTrackCue discussions

On Fri, Sep 13, 2013 at 2:32 AM, Jerry Smith (WINDOWS) <<>> wrote:
Is there an assumption that thin or thick TextTrackCues would be for text only representations?  The existing cue definition (not the newer drafts) can adequately source styled cues and works with getCueAsHTML on separate WebVTT or TTML caption files.  For compatibility reasons, these should continue to work.

The use of format specific cue objects like VTTCue may allow tuned attributes for a specific format, but they also fragment the programming model and make it more difficult for websites to support content with mixed caption formats, do they not?

I have had some side discussions with Silvia and others about the overarching goals of this revision.  Some have replied that it is to focus format specific syntax and features on objects that clearly have a format specific intent.  That would seem predicated on an assumption that a format agnostic solution, usually the desired goal for web specifications, is not possible.  Do we agree that is the case?

IMO, a single format agnostic cue interface is impractical, and, if imposed from without, will result in necessary loss of essential format specific semantics. However, the TTWG hasn't specifically considered this point, so it would be useful to pose the question there to determine if my opinion is shared.


-----Original Message-----
From: Brendan Long [<>]
Sent: Monday, September 9, 2013 9:03 AM
To: Silvia Pfeiffer
Cc: HTML WG (<>)
Subject: Re: TextTrackCue discussions

On 09/08/2013 06:41 PM, Silvia Pfeiffer wrote:
> That would require converting DVD raster graphics to an img format
> that the browser understands. Once a browser implements that, there
> are likely other attributes that it will implement, so a specific
> DVDCue interface would be created.
I agree. My point is that if we create a new DVDCue class, it should be required to implement getCueAsHTML().

> [Constructor(double startTime, double endTime, DOMString text)]
> interface GenericCue : TextTrackCue {
>            attribute DOMString text;
>   DocumentFragment getCueAsHTML();
> };
This is basically what I had in mind, although personally I would put .text in TextTrackCue, since all cues should have it, whether the browser can parse them or not, right?

> enum AutoKeyword { "auto" };
> enum DirectionSetting { "" /* horizontal */, "rl", "lr" }; enum
> AlignSetting { "start", "middle", "end", "left", "right" };
> [Constructor(double startTime, double endTime, DOMString text)]
> interface TextCue : TextTrackCue {
>     attribute DirectionSetting vertical;
>     attribute boolean snapToLines;
>     attribute (long or AutoKeyword) line;
>     attribute long position;
>     attribute long size;
>     attribute AlignSetting align;
>     attribute DOMString text;
>   DocumentFragment getCueAsHTML();
> };
It would be nice if we could assume all cue formats support this interface, but from Glenn and Simon's responses, it doesn't sound like that's realistic. I think there's nothing wrong in principle with using WebVTT's semantics where other format support it though, since consistency is extremely useful.

> I'm now thinking .text could be restricted to just plain text. Thus,
> I've moved getCueAsHTML() into these interfaces, too, seeing as it
> could simply return a DocumentFragment with plain text.
Isn't .text used in unparsed cues though? It seems dangerous to return unparsed cue content in getCueAsHTML(), since that would make it easier for JS developers to accidentally render unparsed cue content (which would be particularly annoying with something like CEA708, where the unparsed content is binary, probably base64 encoded in .text).

Received on Monday, 16 September 2013 19:10:04 UTC