- From: Dick Bulterman <Dick.Bulterman@cwi.nl>
- Date: Tue, 04 May 2010 00:12:12 +0200
- To: HTML Accessibility Task Force <public-html-a11y@w3.org>
After last week's media subgroup phone call (and especially in light of Silvia's rather direct personal attacks on me), I've decided that my further participation in this group would not be productive. While I am, of course, pleased to not to have been accused of harboring weapons of mass destruction (although this could come next), it is clear that Silvia feels that I've been wasting the group's time without producing any constructive feedback. While this posturing may be par for the course in HTML5 circles, it is not the way of resolving differences of insight to which I've become accustomed in my 15 years of participation in W3C working groups. I gladly yield my position to the new generation of cowboys (and cowgirls) in the field. Here are a few closing suggestions regarding Silvia's proposal for temporal composition and content control for captions support. 1. The name 'track' for identifying a text object within a video element is misleading. It may lead people to think that any arbitrary data type could be specified (such as an audio track, an animation track or even a secondary video track). Since this proposal is purportedly intended to allow the activation of external text tracks only, a more reasonable name would be 'textTrack' or 'textStream'. 2. The name 'trackGroup' is equally misleading. In other languages, a 'group' element is used as to aggregate child elements; here, it is used to select child elements. As with 'track' it also gives the impression that a choice can be made within a select of general tracks, which is not true. A name such as 'textSelect' or 'captionSelect' might be more useful. (The 'switch' would only be appropriate if all semantics of the SMIL switch were adapted.) 3. The semantics defined by Silvia for managing selection based on lexical ordering is not clear to me. It seems that the children are first processed to resolve 'source' elements, then 'track' elements (and then trackGroups)? What happens when things appear out of order (such as having 'source' elements interspersed among track elements? 4. The assumption that there are no synchronization conflicts between a video master and the text children strikes me as overly simplistic: it is not practical to simply buffer a full set of captions in all cases. Consider mobile phone use: if a given video had captions in French/Dutch/English, would all three sets be downloaded before the first video frame is displayed? What happens if someone turns on captions while the video is active: does the video pause while the captions are loaded? It the SRT files are large, significant data charges could be incurred, even if the video itself were not played. I continue to be concerned that overloading text selection and temporal alignment within the <video>/<audio> elements is, architecturally, a bad idea. By adding explicit temporal structuring (as is already done in HTML+Time and in scores of playlist formats), the syntax for selecting and integrating text captions would not have to be a special-purpose hack. An example (based on HTML+Time syntax available within IE for over 10 years) is: <div timeContainer="par" controls ... > <video ...> <source .../> ... <source .../> </video> <switch systemCaptions="true"> <textstream src="xxx" systemLanguage="nl" ... /> <textstream src="yyy" systemLanguage="fr" ... /> <textstream src="zzz" ... /> <!-- default --> </switch> </div> There is nothing complex about these solutions -- it simply means making temporal synchronization explicit. It allows easy extensibility for including image slideshows as alternatives to video, or for providing different choice in the case of particular screen sizes or connection speeds. Naturally, this is not an accessibility-only issue, but history has shown that the community of users with special needs are best served when a consistent framework exists for managing multiple content alternatives. I first wrote a position paper on this (with concrete suggestions) four years ago and submitted it to the HTML lists, but it never got on the HTML5 agenda. Since then, I've been told several times that there is no time to come up with an appropriate solution for developing a compressive model for inter-object synchronization before HTML5 goes to last call. (I've been hearing this for about 2 years.) Yet, there is time to come up with non-extensible, non-scalable solutions. There is even time to develop yet another timed text model. In this light, I think that it is indefensible to ignore structured time within HTML5. But this is simply my opinion. I realize that it is especially appropriate within this group to note that there are none so blind as those who will not see (and none so deaf as those who will not hear). I will charitably assume that I am one who is blind and deaf, and blocking progress to boot. For this reason, my departure is as productive as it is timely. I wish you all well in the process of wrapping up this important work. Kind regards, Dick Bulterman
Received on Monday, 3 May 2010 22:12:54 UTC