RE: farewell

Having worked with Dick in other W3C working groups, I can say that his knowledge of SMIL, synchronized multimedia and accessibility are matched by few.  For anyone to feel driven out of a working group is a serious matter, but for someone of his stature to leave because of hostilities is something that speaks loudly, I hope, to the chairs of this group as well as to the leaders of the larger HTML5 working group.  The loss of Dick's knowledge and experience cannot be overstated, and his reasons for leaving are both angering and disappointing.

Geoff/WGBH


________________________________________
From: public-html-a11y-request@w3.org [public-html-a11y-request@w3.org] On Behalf Of Dick Bulterman [Dick.Bulterman@cwi.nl]
Sent: Monday, May 03, 2010 6:12 PM
To: HTML Accessibility Task Force
Subject: farewell

After last week's media subgroup phone call (and especially in light of
Silvia's rather direct personal attacks on me), I've decided that my
further participation in this group would not be productive. While I am,
of course, pleased to not to have been accused of harboring weapons of
mass destruction (although this could come next), it is clear that
Silvia feels that I've been wasting the group's time without producing
any constructive feedback. While this posturing may be par for the
course in HTML5 circles, it is not the way of resolving differences of
insight to which I've become accustomed in my 15 years of participation
in W3C working groups. I gladly yield my position to the new generation
of cowboys (and cowgirls) in the field.

Here are a few closing suggestions regarding Silvia's proposal for
temporal composition and content control for captions support.

1. The name 'track' for identifying a text object within a video element
is misleading. It may lead people to think that any arbitrary data type
could be specified (such as an audio track, an animation track or even a
secondary video track). Since this proposal is purportedly intended to
allow the activation of external text tracks only, a more reasonable
name would be 'textTrack' or 'textStream'.

2. The name 'trackGroup' is equally misleading. In other languages, a
'group' element is used as to aggregate child elements; here, it is used
to select child elements. As with 'track' it also gives the impression
that a choice can be made within a select of general tracks, which is
not true. A name such as 'textSelect' or 'captionSelect' might be more
useful. (The 'switch' would only be appropriate if all semantics of the
SMIL switch were adapted.)

3. The semantics defined by Silvia for managing selection based on
lexical ordering is not clear to me. It seems that the children are
first processed to resolve 'source' elements, then 'track' elements (and
then trackGroups)? What happens when things appear out of order (such as
  having 'source' elements interspersed among track elements?

4. The assumption that there are no synchronization conflicts between a
video master and the text children strikes me as overly simplistic: it
is not practical to simply buffer a full set of captions in all cases.
Consider mobile phone use: if a given video had captions in
French/Dutch/English, would all three sets be downloaded before the
first video frame is displayed? What happens if someone turns on
captions while the video is active: does the video pause while the
captions are loaded? It the SRT files are large, significant data
charges could be incurred, even if the video itself were not played.

I continue to be concerned that overloading text selection and temporal
alignment within the <video>/<audio> elements is, architecturally, a bad
idea. By adding explicit temporal structuring (as is already done in
HTML+Time and in scores of playlist formats), the syntax for selecting
and integrating text captions would not have to be a special-purpose
hack. An example (based on HTML+Time syntax available within IE for over
10 years) is:
    <div timeContainer="par" controls ... >
      <video ...>
        <source .../>
        ...
        <source .../>
      </video>
      <switch systemCaptions="true">
        <textstream src="xxx" systemLanguage="nl" ... />
        <textstream src="yyy" systemLanguage="fr" ... />
        <textstream src="zzz" ... /> <!-- default -->
      </switch>
    </div>

There is nothing complex about these solutions -- it simply means making
temporal synchronization explicit. It allows easy extensibility for
including image slideshows as alternatives to video, or for providing
different choice in the case of particular screen sizes or connection
speeds. Naturally, this is not an accessibility-only issue, but history
has shown that the community of users with special needs are best served
when a consistent framework exists for managing multiple content
alternatives.

I first wrote a position paper on this (with concrete suggestions) four
years ago and submitted it to the HTML lists, but it never got on the
HTML5 agenda. Since then, I've been told several times that there is no
time to come up with an appropriate solution for developing a
compressive model for inter-object synchronization before HTML5 goes to
last call. (I've been hearing this for about 2 years.) Yet, there is
time to come up with non-extensible, non-scalable solutions.  There is
even time to develop yet another timed text model. In this light, I
think that it is indefensible to ignore structured time within HTML5.

But this is simply my opinion. I realize that it is especially
appropriate within this group to note that there are none so blind as
those who will not see (and none so deaf as those who will not hear). I
will charitably assume that I am one who is blind and deaf, and blocking
progress to boot. For this reason, my departure is as productive as it
is timely.

I wish you all well in the process of wrapping up this important work.

Kind regards,
Dick Bulterman

Received on Tuesday, 4 May 2010 01:21:02 UTC