Accessibility for the Media Elements in HTML5[1][2] from Robert J Burns on 2008-09-12 (wai-xtech@w3.org from September 2008)

From: Robert J Burns <rob@robburns.com>
Date: Fri, 12 Sep 2008 21:17:35 +0200
To: Dave Singer <singer@apple.com>
Cc: HTML WG <public-html@w3.org>, W3C WAI-XTECH <wai-xtech@w3.org>
Message-Id: <BCB72E32-1FB0-4774-A5BF-2AD1E63B1E76@robburns.com>
Hi Dave,

On Sep 4, 2008, at 12:13 AM, Dave Singer wrote:

> NOTE:  Please be careful with replies here.  Because the subject  
> alas touches on accessibility, HTML, and CSS I have included all  
> those groups (I hope), and also BCC'd WhatWG.  If you're in WhatWG,  
> please note that the discussion here started on public-html and so I  
> am encouraging it to stay there.
>
> We've actually been thinking about the framework for accessibility  
> of media elements in HTML5.  Note that this is rather different from  
> discussing (say) caption formats or the like.  I've attached a  
> 'thought piece' on the subject, which attempts to lay out some of  
> the needs as we see them, and also proposes a way ahead.
>
> Comments gratefully received;  this is an important subject, yet  
> subtle.  Good accessibility is quite tricky.  If the spec doesn't  
> provide the right framework, or it's unworkable from the point of  
> view of authors or users, you fail, no matter how good your  
> intentions...

Thanks for introducing this discussion. You've obviously put much  
thought into the issues and the WG owes you a debt for doing so. I  
agree with much of what you wrote, so here I'm only focussing on minor  
points of contention and contributing my own thoughts to shape the  
discussion.

First in addition to using an expanded conception of media queries to  
shape the selection of resources, I think we should also encourage  
interactive UAs to provide a mechanism for the user to override those  
selections. In other words the UA should translate the media queries,  
codec information, content type data and the title attribute for the  
source element into localized descriptions allowing the user to  
override the default selection. In this way if for example, the audio  
description is poorly done and  a distraction getting in the way, the  
user can switch to the non-audio-description resource. Likewise for  
language subtitles, the user might find the need to change the  
selection away from the default after the fact.

Second, I think the alt attribute should be unnecessary for these  
elements. The alt attribute is necessary for the IMG element only  
because it needs to be a void element for the text/html serialization.  
Otherwise the contents of the element serve as a much better container  
for the alt text replacement. As far as I can tell, no one has  
presented any reasons for not using the video and audio element's  
contents in this way (for last resort fallback when the other  
accessibility/univerality features of the resources themselves fail).  
Also, several examples from Henri and others have demonstrated that  
not using the element contents for alt text encourages the detrimental  
use of the element's contents for taunting fallback like: "why don't  
you get a real browser that supports HTML5". We don't want to  
encourage this type of authoring.

Third, for long descriptions, transcripts (with action / stage  
direction) and a priori scripts (also with action / stage direction),  
the longdesc attribute might prove useful. This allows authors to  
reference these highly specific text equivalents in a semantically  
well-defined location keeping alt text equivalents separate from these  
typically more verbose text equivalents. It may be beneficial to add a  
new attribute (or attributes) to distinguish these from the longdesc  
attribute. On HTML4All, Philip suggested adding a new attribute with a  
new value syntax such as description='URI(mediadescriptions/ 
description.html)' or description='A lion roaring'. We could even  
introduce such a syntax for longdesc as an example of 'paving the  
cowpaths'. Other attributes could be 'transcript', 'script', etc.  
Alternately, these attributes could be added as child elements of the  
video and audio elements or even referenced from separate 'source'  
elements in the ordered list of source elements (especially if we  
recommend UAs provide UI for user override selection among source  
elements).

Finally, your discussion introduction and some of the other comments  
made recently on this topic raise the question for me whether HTML5  
should strive to include more flexible authoring of video and audio  
content that does not rely exclusively on the capabilities of the  
various container formats for these alternate tracks. In other words  
the various video tracks, audio tracks, subtitle tracks, caption  
tracks, etc may be all handled as a single file from the HTML  
document's perspective. In that case the HTML5 specification does not  
really need to concern itself with defining much else. However we  
could take the extra steps to facilitate more distributed and  
decentralized authoring of content by allowing each video, audio or  
source element to also reference separate tracks for client-side  
muxing. In this way all of the source elements might all sharing the  
same time indexing might be included along side-SMIL referenced audio  
description, subtitles and captioning files potentially located even  
on separate servers (perhaps with AcessControl policing this). This  
use case would not replace the use of server-side delivery or pre- 
muxed container formats delivered over RTSP or like protocols, but  
would provide another flexible mechanism for distributed authoring of  
content. For example, consider a site providing video telecasts of  
events in the US. Now imagine another vendor in Minsk adds value to  
the US video telecasts by simply adding Beoelorus subtitles, captions  
and audio description through time-based resources located on their  
own servers. Now the product can be streamed and viewed in Belarus in  
a decentralized manner. Multiple resources from servers on opposite  
sides of the Earth are combined client-side into a single stream for  
local consumption. Such a mechanism would also address the use case  
raised earlier on the list for a decentralized wiki style multimedia  
enhancement and localization.

I'm thinking of perhaps something like this:

<video>
   <source' media='<a media query>' >
     <track src='avideofile' >
     <track src='anaudiofile' >
     <track src='acaptionfile' languages='<language metadata>' >
     <track src='asubtitlefile'' languages='<language metadata>' >
     <track src='anothersubtitlefile'' languages='<language metadata>' >
     ...
   </source>
    ...
   <source src='afile2' media='<a media query>' ></source>
   <source src='afile3' media='<a media query>' ></source>
   ...
</video>

Alternatively, we could add SMIL as yet another extension format  
supported within HTML5 or referenced for embedding by the src  
attribute (or finally bite the bullet and add an IE8 / 'XML  
namespaces' compatible namespace mechanism to HTML5 ).

Take care,
Rob


[1]: <http://lists.w3.org/Archives/Public/public-html/2008Sep/0118.html>
[2]: <http://lists.w3.org/Archives/Public/public-html/2008Sep/att-0118/html5-media-accedssibility.html 
 >
Received on Friday, 12 September 2008 19:18:20 UTC