RE: TTML Agenda for 15/05/13 - Proposed updates to charter from Sean Hayes on 2013-06-06 (public-tt@w3.org from June 2013)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Thu, 6 Jun 2013 11:24:06 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: Michael Jordan <mijordan@adobe.com>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <E9A92BD0A4FC934EB7935470A46D15241F67C270@DB3EX14MBXC323.europe.corp.microsoft.c>
>I may not fully understand what you are trying to achieve, so bear with me. What I read (and I may be wrong) is that you want WebVTT to map >to WebVTT objects ("WebVTT Node objects, see http://dev.w3.org/html5/webvtt/#webvtt-cue-text-parsing-rules), and TTML to map to TTML >objects, then these objects to map to some abstract object model before mapping that abstract object model to HTML objects for rendering?

No. What I am suggesting is modifying the specifications to define WebVTT to map to TBDO objects and define TTML to map to TBDO objects. Where TBDO is the to be decided object model, as I point out the internal object model of both formats are simple enough that designing TBDO is pretty trivial, although it does require a willingness on both sides to change their specs. If that basic spirit of cooperation is not present then we might as well forget the entire enterprise.

> I would keep this exercise separated from the WebVTT, the TTML, and the HTML spec and not require implementation. It's mostly interesting for conversions

I believe this is in fact a perfectly viable approach for implementation for reasons I can't discuss on a public mailing list.

>BTW: have you thought about that you could just define one of the two to be the abstract object model and map the other one and any other format to it?

Yes I believe the XML Infoset would be the better more established choice, however I realize that this would set off the anti-XML knee-jerk reaction, so I'm not necessarily wedded to that idea.

>All browsers that implemented more than the basic text support for WebVTT implemented creation of WebVTT Node objects as specified in 
>the WebVTT spec, see http://dev.w3.org/html5/webvtt/#webvtt-cue-text-parsing-rules . Those node objects are being mapped to HTML 
>DOM nodes in http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules

And that is fine, TBDO is mostly just a naming exercise, since the objects in WebVTT are not really much more than names anyway, any  implementations wouldn't have to change necessarily.

>Does TTML provide an explicit rendering algorithm? As I understand it, TTML relied on XSL-FO for rendering... yes, I just found this quote:
>"For each resulting document instance F, if processing requires presentation on a visual medium, then apply formatting and rendering semantics 
>consistent with that prescribed by [XSL 1.1]."

The term *consistent with* here means that you are free to implement as you will, provided you produce visible results that look like those produced by the reference implementation. And in point of fact CSS, for the requirements of TTML, is indeed consistent with XSL-FO in that sense (since XSL-FO references CSS pretty much for the parts we rely on, except for a few details caused by CSS3 not remaining stable which we are cleaning up). The HTML5/CSS mapping will therefore define the reference rendering for CSS.

>The rendering section of the WebVTT spec is quite complicated and uses many of the specifics of WebVTT cue settings and custom 
> algorithms to avoid cue overlap etc.

Yes, I believe this is the biggest impediment to progress. I think not only are these rules complicated, they are in fact ambiguous to the point of non-interoperability, and possibly containing circular dependencies. The proposed region additions also seem to not fit well with them at all.
Personally I think it would be much better if the non-overlap constraint was moved into the document conformance, like the timing constraints are and simply rely on CSS with no alterations. CSS is at this point a sufficiently general rendering technology that cue settings should be capable of being mapped into un-transformed CSS. I do find the definition of :past and :future troubling however, given the implications of how often they could cause the CSS engine to run. I would like to see if these could be mapped to CSS animation.

> I'd leave it to the market to create lossless conversion tools and support them. I wouldn't expect authors to do this by hand.

Given the above, while a good approximation is feasible, I don't think truly lossless is actually possible. Certainly not without a better reference implementation of the WebVTT rendering algorithms.

>Well, I would not want to restrict the development of one format by the feature set available to other formats, or to the object model.
>You wouldn't want to stop adding features to TTML just because these features are not available in VTT yet and therefore not specified in the
 >common object model.

Actually I would. The caption using public has suffered for decades because of the continual need to translate from one format to another, which leads to increased costs, delays and errors; which ultimately adds up to a great deal of non-captioned content.  We had a moment in time where it might have been possible to fix that, however for reasons I'm not particularly interested in rehashing we failed to do so. However we may have another opportunity to at least mitigate against it now.

I believe that what the caption and subtitle industry, and more importantly the users that are Deaf or hard of hearing, most urgently need is a single lingua-franca; and we are not serving them well if we don't at least try to merge these efforts. To the extent that we have two formats at all, VTT and TTML should be effectively two syntaxes for the same thing, where inter-conversion is a trivial rewrite. If new features are desirable, then they should be desirable, and usable for all formats.

Sean.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: 06 June 2013 07:22
To: Sean Hayes
Cc: Michael Jordan; public-tt@w3.org
Subject: Re: TTML Agenda for 15/05/13 - Proposed updates to charter

Hi Sean,

I'm finally finding time to properly consider what you suggested below. Thanks for your patience.

On Fri, May 17, 2013 at 8:03 PM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> Hi Silvia, thanks for your observation and I agree with what you say, however I think you are missing the intent here actually. This has nothing to do with the operational model you started, and is in fact an entirely practical mechanism for inter-conversion.
>
> Both formats already have their own internal object model. VTT has the 
> following classes, which is the target of the parse algorithm, 
> although their functionality is not fleshed out much beyond being a 
> bridge to the HTML DocumentFragment WebVTT Class Object WebVTT Italic 
> Object WebVTT Bold Object WebVTT Underline Object WebVTT Ruby Object 
> WebVTT Ruby Text Object WebVTT Voice Object WebVTT Class Object WebVTT 
> Text Object WebVTT Timestamp Object
>
> TTML has an infoset which contains the following:
>
> Document Information Item
> Element Information Item
> Attribute Information Item
> Character Information Item
>
> Which is a target for the parser and intermediate document form in TTML.
>
> It is my observation that these could easily be unified. Since most of the VTT Objects are essentially an Element Information Item + an Attribute Information Item. The only part that wouldn't actually work today is the reduced infoset doesn't include the processing instruction necessary to convert the timestamp object, but that's a relatively minor fix on the TTML side.


I may not fully understand what you are trying to achieve, so bear with me. What I read (and I may be wrong) is that you want WebVTT to map to WebVTT objects ("WebVTT Node objects, see http://dev.w3.org/html5/webvtt/#webvtt-cue-text-parsing-rules), and TTML to map to TTML objects, then these objects to map to some abstract object model before mapping that abstract object model to HTML objects for rendering?

If that is the case, then I don't think browser that only want to support one format will want to implement the abstract object model mapping.

It does sound like an interesting theoretical exercise and would help us in conversions between the two formats, and could also help if browsers decided to implement support for even more formats (e.g. SCC for CE608). But I would keep this exercise separated from the WebVTT, the TTML, and the HTML spec and not require implementation. It's mostly interesting for conversions.

BTW: have you thought about that you could just define one of the two to be the abstract object model and map the other one and any other format to it?


> There is code available to do the TTML -> infoset translation for the 
> intermediate document (the algorithm is documented in 
> http://www.w3.org/wiki/TTML/changeProposal005), and I presume the same 
> is true for WebVTT

All browsers that implemented more than the basic text support for WebVTT implemented creation of WebVTT Node objects as specified in the WebVTT spec, see http://dev.w3.org/html5/webvtt/#webvtt-cue-text-parsing-rules . Those node objects are being mapped to HTML DOM nodes in http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules
.


> It is relatively easy to write code to serialise out a WebVTT file from the intermediate TTML infoset (possibly with a side car CSS file); and while the file might not be optimal, it would at least give the same rendering. And, I expect likewise it would be relatively simple to write a TTML file from the WebVTT object set.

Good - that's the conversion that we should describe somewhere as an informative note. I'd hesitate to make it a spec, because as soon as the WebVTT or TTML specs change, these conversions will change, too, so a note seems more appropriate.


> But I don't think that pushing the burden of conversion on to the authors is the best way forward here.

I'd leave it to the market to create lossless conversion tools and support them. I wouldn't expect authors to do this by hand.


> I hope that once other UA implementers see as we have the amount of commonality here, and that one can reuse much of the same implementation for both TTML and VTT, and further that it is in fact actually simpler to do the TTML rendering as it requires no change to the CSS model; they may indeed feel it less of a burden to add TTML support.


Do absolutely try to make it easier for UAs to implement several formats. But don't expect them to implement an abstract object model when they don't want to support more than one format. You should simply be open to both options and the spec should not be restrictive in this way.


> Both specifications are already working on a mapping to HTML5 DocumentFragments and CSS from their version of the object mode.  What I am proposing is really a unification of that effort. Ideally in such a way that does not involve requiring any custom CSS implementation for cues. What I believe would be ideal would be for there to be one documented rendering model for one common object model; at which point implementation of both TTML and WebVTT are simply a matter of plugging in a different parser to the same back end.   This would involve splitting the VTT spec in half, one part for parsing, the other for rendering,


WebVTT already distinguishes separate sections for parsing and rendering:
* it separates WebVTT file parsing: http://dev.w3.org/html5/webvtt/#parsing
* from cue parsing:
http://dev.w3.org/html5/webvtt/#webvtt-cue-text-parsing-rules
* from DOM construction:
http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules
* from rendering: http://dev.w3.org/html5/webvtt/#rendering


> and similarly for TTML; and then writing the rendering part up as its own document in a way that suits both purposes. It would also provide a target for other caption formats to use directly too. This may be a bit idealist at this point, but I believe it is certainly doable, and in the best interests of everyone in the community.


Does TTML provide an explicit rendering algorithm? As I understand it, TTML relied on XSL-FO for rendering... yes, I just found this quote:
"For each resulting document instance F, if processing requires presentation on a visual medium, then apply formatting and rendering semantics consistent with that prescribed by [XSL 1.1]."

As you point out, you've started specifying an alternative rendering algorithm for TTML that renders to HTML in http://www.w3.org/wiki/TTML/changeProposal005#TTML_cue_to_HTML_cue_construction_rules
. I can see that it's quite different to the WebVTT one and uses many of the TTML-specific property names etc.

The rendering section of the WebVTT spec is quite complicated and uses many of the specifics of WebVTT cue settings and custom algorithms to avoid cue overlap etc.

Do you think a common rendering algorithm is even possible?


> As VTT continues to grow and adds the additional features of TTML, such as region support, named metadata and style sets, the model may need to grow slightly, I would prefer that it grow in such a way as to not break this inter conversion.

Well, I would not want to restrict the development of one format by the feature set available to other formats, or to the object model.
You wouldn't want to stop adding features to TTML just because these features are not available in VTT yet and therefore not specified in the common object model.

Is there even a inter-conversion spec that could be broken at this time by the introduction of new features to formats?


Regards,
Silvia.
Received on Thursday, 6 June 2013 11:25:53 UTC