Re: Acessibility of <audio> and <video> from Dave Singer on 2008-09-09 (www-style@w3.org from September 2008)

From: Dave Singer <singer@apple.com>
Date: Tue, 9 Sep 2008 15:53:19 -0700
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Cc: public-html@w3.org, W3C WAI-XTECH <wai-xtech@w3.org>, www-style@w3.org
Message-Id: <p06240815c4ecaa313430@[10.0.1.8]>
At 13:22  +0200 4/09/08, Lachlan Hunt wrote:
>Dave Singer wrote:
>>2.2 Associated with the media
>>
>>2.2.1 Introduction
>>
>>There are also needs to associate data with the 
>>media, rather than embed it within the media. 
>>The Web Content Accessibility Guidelines, for 
>>example, request that it be possible to 
>>associate a text transcript with timed media. 
>>Sometimes even, for very short media elements, 
>>alternative text may be enough (e.g. "a dog 
>>barks").
>>
>>Finally, we need to consider what should happen 
>>if source selection fails: none of the media 
>>file sources are considered suitable for this 
>>user-agent and user. What is the fallback in 
>>this case?
>
>It should pick the closest match available, even 
>if not all conditions were met.

That would be a two-pass algorithm, and we'd have 
to define 'closest', neither of which are very 
palatable.  I'd rather encourage content authors 
to make sure that the last source(s) can be 
played by anyone.

>>2.2.3 longdesc
>>
>>The longdesc attribute, when used, takes a URI 
>>as value, and links to a 'long description'. It 
>>is probably the attribute to use to link to 
>>such things as a transcript (though a 
>>transcript is more of a fulltext alternative 
>>than a description).
>
>The longdesc attribute is not included for the 
>img element.  It has been clearly demonstrated 
>in past discussions that it is a complete 
>failure in practice and pursuing it as a 
>solution for video is, IMO, a waste of time. 
>Plus, I have already explained why any sort of 
>long description, whether it be a transcript, 
>full text alternative, or whatever else, is 
>useful to more people than just those with 
>accessibility needs.  Any links to a long 
>description should be done using ordinary, 
>visible links from within the surrounding 
>content.

OK, but I feel that videos and images are 
different (though I'd like them to share 
mechanisms when possible).

>>Clearly we can now define when a media source 
>>matches user needs. A source *fails* to match 
>>if and only if either of the following are 
>>true; otherwise, the source matches:
>>
>>    1. The user indicates a need for an axis, and the source is tagged as
>>       explicitly /not/ meeting that need;
>>    2. The user does /not/ indicate a need, and the file is tagged as being
>>       explicitly targetted to that need.
>
>I disagree with #2 being considered a failure. 
>A video may contain features intended for 
>accessibility, such as captions, but if they are 
>closed captions, then they don't need to be 
>turned on.  If they are open captions, then it's 
>not too much of a problem.  However, at for me, 
>a video with open captions should be given a 
>lower priority than one without.  Obviously, 
>other people will have different priorities.

The reason is actually fairly simple but subtle. 
If #2 is not a failure then a user who hasn't 
asked for captions, encountering this:

<source ... accessibility="captions:yes" ... />
<source ... />

would get the first one.  Rather 
counter-intuitively, you have to put the *more 
general* source *first* and later the more 
specific.  This seems even more confusing than 
rule #2...

>
>>We believe that the source tagging should be done as Media Queries
>
>I don't think we should be jumping to solutions 
>just yet.  Media queries is one possibility. 
>Another is to provide a different attribute or 
>several attributes to indicate each axis, and 
>there may be others to consider as well.  In 
>fact, I don't think media queries is appropriate 
>for this at all, since it's designed for 
>indicating features describing the target 
>device, not user preferences.

It depends on whether you consider the viewing 
user part of the output experience :-).  But 
we're not dogmatic.

>
>>3.2 Method of selection
>>
>>We suggest that we add a media query, usable on 
>>the audio and video elements, which is 
>>parameterized by a list of axes and an 
>>indication of whether the media does, or can, 
>>meet the need expressed by that axis. The name 
>>of the query is TBD; here we use 
>>'accessibility'. An example might be:
>>
>>|accessibility(captions:yes, 
>>audio-description:no, 
>>epilepsy-avoidance:dont-know)|
>
>That doesn't seem to fit the syntax of media 
>queries, where each feature is supposed to be 
>given within parenthesis. e.g.
>
><source ... media="screen and (min-height:240px) and (min-width:320px)">

right...maybe I misplaced a paren;  I didn't 
manage to work out how to say it as a MQ etc. 
while avoiding problem syntax or characters.

<source ... media="(accessibility captions:yes, 
audio-description:no, 
epilepsy-avoidance:dont-know)" />
?

>Captions, if available, may be open or closed, 
>and only available in particular languages. 
>Subtitles, if available, may be open or closed 
>and be available in one or more languges.  It's 
>even possible to have open subtitles in one 
>languge, yet have alternative closed subtitles 
>shown over the top if turned on.  Audio 
>descriptions may not be available in all of the 
>languages that the video is available in.

right. the whole way that this interacts with 
language selection needs exploration.  the 
question of sign-language identification is 
particularly thorny.

however, complex cases will probably only be 
handled 'manually' with buttons etc. on the page 
(and thus needing a DOM API to enable/disable 
tracks etc.)

>>Note that the second matching rule above means 
>>that sources can be ordered in the usual 
>>intuitive way ó from most specific to most 
>>general ó but that it also means a source might 
>>need to be repeated. For example, if the only 
>>available source has open captions (burned in), 
>>it could be in a single <source> element 
>>without mentioning captions, but it is better 
>>in two <source> elements, the first of which 
>>explicitly says that captions are supported, 
>>and the second is general and un-tagged. This 
>>indicates to the user needing captions that 
>>their need is consciously being met.
>
>I think we should avoid repetition of source 
>elements pointing to the same media, and instead 
>provide ways of accurately describing what each 
>has available.

see above for the dilemma

>
>I've not seen many programmes use sign language.

The BBC does (or did), much to my surprise.  We 
certainly shouldn't assume failure!

>This could be done with attributes.  For example:
>
><video ... captions="open:en; closed:fr,de"
>            subtitles="closed:nl"
>            audiolang="en,fr,de">
>            audiodesc="en"
>
>Or perhaps a single accessibility attribute:
>
><video access="(captions=open:en;closed:fr,de)
>            and (subtitles=closed:nl)
>            and (audiolang=en,fr,de)
>            and (audiodesc=en)">
>
>The syntax of both of those might be a little 
>complex though, and I would prefer to simplify 
>them if possible.

Um, yes. :-)

>  One issue is that while this does correctly 
>distinguish between captions and subtitles, 
>educating authors to use them correctly rather 
>than interchangeably may be a problem, 
>especially given that they incorrectly use the 
>term subtitles for both in the UK.
>
>Another problem to consider with automaitic 
>selection mechanisms is that, AIUI, common video 
>container formats don't provide a way to 
>programmaticly distinguish between subtitle 
>tracks and caption tracks, since both are just 
>text tracks.

If needed, they can 'catch up'.

>  I think they just provide the ability to 
>declare the language of the track, and some also 
>provide the ability to include human readable 
>descriptions.  Text tracks can also be used for 
>other information besides subtitles and 
>captions.  For example, I've seen DVDs provide 
>commentary using a text track without an 
>accompanying audio track.
>
>Note that I didn't use the lang or xml:lang 
>attributes to express the language of the audio 
>streams because it's limited to declaring a 
>single language.  However, in the absense of an 
>explicit audio language declaration, then 
>assuming it's the same as the element's language 
>is a reasonable default.

Thanks for all these careful thoughts, really appreciated.
-- 
David Singer
Apple/QuickTime
Received on Tuesday, 9 September 2008 22:56:18 UTC