Re: Acessibility of <audio> and <video>

On Sep 4, 2008, at 01:13, Dave Singer wrote:

> 2.1.2 Configuring
> Sometimes, similarly, the media format itself can carry optional  
> features. An example might be the 3GPP file format (or any file  
> format from that family, such as MP4) with a text track in 3GPP  
> Timed Text format. Enabling this track (and thereby causing it to be  
> presented) may be a way to satisfy a need within a single media file.

It seems to me that for captioning, an off-by-default track within the  
main file is preferable over burned-in open captions, because tracks  
within the main file travel better, compress better (and transferring  
the captions even when not needed is not burdensome in terms of  
relative network bandwidth) and make video more searchable.

> (In some cases, the media format may also need to disable a track;  
> for example, a track providing audio description of video may  
> incorporate the standard audio within it, and the normal audio track  
> would be disabled if the audio description were enabled.)

I would guess that content providers would opt for alternative files  
in this case, because additional audio tracks show up on the bandwidth  
bill if served even when not needed.

> We therefore also need the ability to apply the same preferences  
> used for selection, to configuring the file. Note that not all media  
> sub-systems will offer the user-agent such an API; that is  
> acceptable  for media files associated with those systems, the  
> files are not configurable and selection must be used instead.

This seems alarming. Does at least one of QuickTime, GStreamer or  
DirectShow lack such an API? If one of those lacks such an API, can  
such an API be put in place in a timely manner?

It seems to me that if automatic selection isn't reliable, content  
providers will shy away from an automatic selection system.

> 2.2.3 longdesc
> The longdesc attribute, when used, takes a URI as value, and links  
> to a 'long description'. It is probably the attribute to use to link  
> to such things as a transcript (though a transcript is more of a  
> fulltext alternative than a description).

I think transcripts should not be considered accessibility-only data.  
Transcripts are useful to users who can hear just fine. I think an  
automated association mechanism isn't necessary here. <a  
href='transcript.html'>Transcript</a> should do.

> The user preferences are two-state: 'I need accessibility X', 'I  
> have no specific need for accessibility X'. For un unstated  
> preference 'no specific need' is assumed.
> The tagging is however tri-state  in some sense yes/no/dont-know.  
> The media needs to be able to be tagged: 'I can or do meet a need  
> for accessibility X'; 'I cannot meet a need for accessibility X'; 'I  
> do not know about accessibility X'. For an unstated tag, 'I do not  
> know' is assumed.

This looks like a very sensible approach.

> We believe that the source tagging should be done as Media Queries.
> There is work ongoing at Dublin Core and IMS on the ways to state  
> user preferences for accessibility, which may be relevant.

CSS has a much better level of author and browser vendor acceptance  
than RDF. Also, Media Queries are much easier to author than RDF. I  
would, therefore, guess that an MQ-based approach is more likely to be  
implemented in practice than an RDF-based approach.

>    2. Subtitles (in the USA and Canada sense) are not strictly an  
> accessibility issue, but can probably be handled here.

> Note that since it's not possible to express a concrete anti- 
> preference ('I absolutely must avoid captions'), all accessibility  
> axes have to be expressed in terms of something positively needed  
> ('I need video that avoids inducing epileptic fits') rather than  
> avoided (you cannot say 'I must not be presented with video that  
> might induce epileptic fits').

I would caution against treating subtitles (in the US/Canada sense) an  
instance of the same selection mechanism engineering problem as  
captions (in the US/Canada sense) just because they are the same  
engineering problem as far as encoding timed text goes.

Not hearing audio is (for practical modeling purposes) a single  
dimension: One can hear, one can't hear well, one is deaf. I don't  
know if "can't hear well" maps simply to "captions on", but in the  
most obvious case (a fully deaf person), the need for captions is a  
very simple thing to model in a selection mechanism and in the  
configuration UI. Moreover, society assumes people to be presumptively  
not deaf, so the captioning (in the US/Canada sense) should be off for  
the audience in general and turned specifically on for users who need  
it. (Granted, the choice between captions and sign language makes  
things harder than a single dimension, but perhaps the case where the  
content provider has resources and willingness to provide both isn't  
the common case that an automatic selection mechanism should be  
engineered for.)

Language skills are multidimensional: A person whose language skills  
cover a non-English native language and English already has four  
dimensions: skill level in both reading and listening in both  
languages. This makes automatic selection mechanism hard to engineer.  
Consider what are the least assumptions that make it hard: the user  
(whose native language isn't English) listens to English on the  
vanilla Hollywood/BBC level and prefers not to have subtitles in that  
case but fails when there's too much noise (either in the video or in  
the viewing environment) or accents. This is about the most common  
case when the browser user isn't monolingual. And even this common  
case breaks automatic selection and needs user discretion.

More to the point even, automatic mechanisms for language selection  
are known to be *practically pointless* to engineer, because we  
already know from HTTP Content-Negotiation that users don't bother to  
configure it.

Finally, society's presumption of the default is different. If you  
have a news outlet whose editorial language is Foo and they have a  
video of an interview where the interviewee speaks language Bar, the  
expectation is to have subtitles in language Foo enabled by default.  
Content providers should be able to be absolutely confident that they  
can mark subtitles to be on by default without the browser user  
needing to take action to enable them (perhaps taking action to  
*disable* them). Otherwise, they'll burn the subtitles into the video  
data to ensure that they are visible by default, which would be bad  
for compression, bad for the searchability of Web content and bad for  
rendering the content without either video or audio on tactile media.

I think there's a very real risk of creating something over-engineered  
that won't get implemented and used in a timely fashion (leading to  
burned-in text for years) if subtitling is treated as an instance of  
the same opt-in selection problem as captioning--as opposed to  
allowing content providers to mark a text track as being an on-by- 
default subtitle track and letting users to opt out of subtitles from  
the context menu on a case-by-case basis.

Henri Sivonen

Received on Thursday, 4 September 2008 08:18:38 UTC