Re: accessibility of video element

At 23:21  +0200 2/07/07, aurélien levy wrote:
>Is anybody have ideas about this issue :
>
>- Actualy i see no way to have synchronized 
>caption and audiodescription on video element 
>(except from directly embed caption or 
>audiodescription in the video itself) or is the 
>media + source element here to achieve things 
>like that ? why did not you take the SMIL audio 
>and text element ?
>
>Aurélien

I apologize for not replying to this earlier. 
But since it's currently wiki-ed and deserves 
response, here we go.  Also, explicit provision 
of support for audio and video opens up a 
historic opportunity to elevate the level of 
accessibility of multimedia, and we should take 
advantage of it.

The current hope, not explicitly stated anywhere 
(and it should be), is roughly as follows.  But 
note that I and others are working this area; 
I've spent some excellent time at the TP with 
some accessibility people and hope to improve the 
status.

Overall, I am acutely aware that good 
accessibility results not only from simply 
writing specs that appear to enable it, but also 
from the three-legged stool of 
implement-author-use.  That is, for accessibility 
to actually happen:
a) the tool vendors have to implement the support 
for both authoring and presentation;
b) the authors have to use that support and provide accessible content;
c) the users have to be able to use the system 
and achieve the accessibility they need.

It's amazingly easy to design schemes which fail 
on one of these three.  Those are not good 
designs.

So, the specification design.

First, I think it's good if the design is well 
layered.  We need good support and a good 
framework at the markup level (HTML, CSS etc.), 
and I think that should be technology neutral, 
and also (as much as possible) neutral on the 
kinds of accessibility and the ways that it can 
be achieved.

Then, at the media level, it should be possible 
to respond to the accessibility needs, and 
actually provide accessibility.

With all that in mind, where do we stand?

First, the audio and video elements have a set of 
sources, and the source selection can be affected 
by media queries.  We envision one or more media 
queries that allow the users to express a 
permanent binary preference about various 'axes' 
of accessibility:  I explicitly want captions, I 
explicitly must avoid stroboscopic effects in 
video, I need audio description of video, I need 
high-contrast audio (audio with minimal 
background noise or music), and so on.  Thus, we 
can select a source that is suitable for a need. 
The design clearly hopes that the axes of 
accessibility are capable of update (addition) 
without having to do a full revision of the HTML 
specification.

Second, we envision that the selected source 
should then be configured with those same 
preferences.  For example, some container formats 
support streams that can be optionally enabled. 
A caption track might be distributed disabled but 
enabled on preference:  the user's preferences 
must be used not only to select a source but also 
configure it.

Third, various attributes of the video and audio 
elements allow for control of other aspects of 
the presentation.  For example, I learned this 
week that some users like to have multimedia 
presented slower than normal.  I want to look 
into the control of playback rate and check that 
they can manage such that 'normal' is say 80% of 
what is considered generally normal.  Discussions 
this week have suggested that contrast might fall 
into the same situation.


So, with that as a framework, do we also need to 
look at the media level?  If we embed a SMIL file 
in the HTML, it can carry the captioning as a 
separate stream;  I (we) need to look into 
whether system-select can be used there to enable 
captions etc. based on user preferences (the 
'configuration') described above.

Another interesting area is 'post-captioning' -- 
databases of captions that are developed in 
various languages independently of the content 
producer.  I'm not sure how easy it is to provide 
for the user to be able to pick up and link those 
in, if needed;  it may be an author question. 
Note that source selection in the audio/video 
tags, and system-select in SMIL, already cover 
language-based selection.

There is a small issue here for sign language, by 
the way.  ISO 639-2 (3-letter codes for 
languages) has a single code for sign language in 
general ('sgn');  the iana registry 
<http://www.iana.org/assignments/language-tags> 
(RFC3066, which says it's obsolete, but I assume 
RFC4646 has similar provision) has geographic 
variants, but apparently there are dialect 
effects caused by the fact the sign language is 
almost entirely real-time and local (unlike, say, 
english, which is both written and broadcast). 
So apparently dialects develop, particularly 
around schools...

There's clearly much work to be done here still. 
What are the appropriate binary axes 
(want/dont-want)?  What are the aspects of 
multimedia that need variable control (rate, 
contrast are identified above)?  Can they be 
'styled' or otherwise controlled appropriately? 
Even if the framework is right, are there any 
formats that fit into it?  What other lurking 
issues are there like the sign language one?

-- 
David Singer
Apple/QuickTime

Received on Friday, 9 November 2007 17:15:47 UTC