RE: Timed tracks

SRT is far from the closest-to-ideal format for anyone that’s actually involved in real captioning.

The features one would need to add to SRT:
1) Make it amenable to CSS styling. Which means making it have a regular syntax that CSS selectors will work on (XML would spring to mind), or embedding styling information in the format itself.
2) Enable it to support internationalised text, including bidi rules horizontal and vertical line and block layout.
3) Enabling it to be positioned precisely with respect to elements in the video to avoid spilling over burned in text 
4) Supporting Ruby display
5) Supporting multiple simultaneous captions for turn taking dialogue. Including positioning them differently in relation to video.
6) Specifying text backgrounds with/without opacity.
7) Supporting common caption idioms like roll-up, word at a time and line at a time pop-on. To take advantage of TV caption data.
8) Supporting inline styling for emphasized words.

These features need to be supported by the HTML embedding:
7) Closing the content off from the host HTML page when it comes from a different domain (while preserving 1) to protect IP.
8) Enabling the video owner to supply the styling independent of the host page.

Once you've done all that, you are going to be looking at something very similar to TTML.

No browser needs to implement XSL:FO to support TTML, that is a complete red herring. Flash and Silverlight have been using it for 2 years or more and neither is using XSL:FO. There is no need to write a spec defining how layout "primitives should be interpreted", the TTML spec already does that. 

In fact there is simply no need to integrate the caption rendering into the HTML rendering at all, it is embedded content and should be handled as such.  

Also, there are existing mechanisms for cue lists; I'm not clear on why we are (also) adding them to SRT. (For example, if we're going down a SMIL path - surely SMIL is a better starting point?)

-----Original Message-----
From: [] On Behalf Of Tab Atkins Jr.
Sent: Thursday, May 06, 2010 9:42 AM
To: Geoff Freed
Cc: Maciej Stachowiak; Philippe Le Hegaret; Edward O'Connor; Ian Hickson;
Subject: Re: Timed tracks

On Wed, May 5, 2010 at 5:49 PM, Geoff Freed <> wrote:
> So WebSRT will be different from SRT, which is different from TTML... speaking from a broadcaster/content producer point of view, I find this very discouraging.  We already have a plethora of formats to deal with, each with its own limitations.  WebSRT, too, will have its own limitations.  Is the goal now to extend SRT into WebSRT in order to cover basic features already available in TTML, simply in order to eliminate the need for TTML?  Correct me if I'm wrong, but this is what seems is happening.

In essence, yes, though you can replace "TTML" with most existing captioning formats, since the majority of them are substantially more complex to author and parse/display than is necessary for the vast majority of content.  SRT is the closest-to-ideal existing format, it's just missing a few relatively small things that turned out to be widely necessary.

The alternative to defining one format is to support all formats above some baseline usage number.  There are a lot of formats, though, without any substantially dominant ones, so this potentially means supporting a lot of different formats.  Further, these will all require substantial work to map them into the layout framework the web uses, so they can be interoperably implemented.  Even if were to just say "All right, we're just doing TTML", it would require us to still produce a spec explaining how TTML's layout primitives should be interpreted.  Potentially, of course, browsers could just implement XSL:FO directly, but initial feedback indicates that that's not an option they're willing to support.  So we'd have to define how all of that maps into CSS, which would be as much or more work.


Received on Thursday, 6 May 2010 21:13:34 UTC