Re: [media] alt technologies for paused video (and using ARIA)

I've given this issue a lot of thought, and appreciate all of the people who have contributed ideas toward finding an appropriate solution.  I discussed the question with John for a bit yesterday morning, and here is what I wrote to him after our conversation, once I had all of my jumbled thoughts together. We can definitely find a solution to this problem working together.

After our talk this morning let me try to express some of my thoughts in a more formal and sequential manner.

The problem:

How to make the html5 video element and its associated resources accessible to users of assistive technology which depend on the accessibility tree of most standard browsers. * Note, this is not about the accessibility of the video content itself, which is a set of moving pictures with audio and possibly a set of captions and a transcript.


The html5 video element allows for authors to embed video into their content and to rely upon browsers to handle playback of the video content.  The video element represents a video player which, for the purpose of this discussion, is comprised of: a canvas and a set of playback controls.  On the canvas, prior to the video beginning to play, may be displayed any frame of the video, or an image resource external to the video.  After the video has begun to play this image is no longer available, and the video content is displayed on the canvas. WCAG 2.0 SC 1.1.1 requires that all non-text content have an accessible textual alternative; and SC 4.1.2 requires that all controls have a programmatically determinable role and name.

Based on the above description there are three unique resources which must be exposed to the user in the accessibility tree: the video player, the video, the still image that is displayed prior to the video beginning to play.  The controls must also be exposed, but that is not the purpose of this discussion, and it appears that implementers have already accomplished this suitably (having tested Firefox 3.6 and Safari 5). For each of these an accessible role, name, and description, must be exposed to users of assistive technology through the accessibility tree.

Let us take as an example a video of cats playing in the snow, with a initial still image of ACME Video Co. in the Super Duper Video Player (as some sites may wish to brand the player, even though it is implemented by the UA).  The following would be required for assistive technology users.

1. Video player: role="video player", name="Super Duper Video Player", description="".

2. The still image: role="image", name="ACME Production Co." description=""

3. The video content itself: role="video" name="Cats playing in the snow" description=""

For any of these objects a description may also be desired, but for simplicity I have left this blank.  For example, if there were content on the page describing the video it would need to be associated with the description of the video content (as it is a description of the "video", and not of the player).

I do not think that the spec is currently semantically rich enough to represent this much content and to map it adequately to the accessibility API.  There may be debate as to whether a description on the page should be mapped to the video.  The question asked is "why can't the assistive technology user find the description on screen, just like a user without assistive technology).  My answer is that users who have full access to the content have access to all affordances, primarily visual and textual, to assist them with associating different chunks of content. Not all users have access to all affordances, and this is why it can be helpful to create a programmatic association.  Furthermore, when a description is programmatically associated with a media resource there are a number of other benefits, including search indexing, that may be found.

Everett Zufelt
Accessibility Consultant & Web Developer

Phone (toll free U.S. & Canada)
1-877-ZUFELT-8 (1-877-983-3588)

Follow me on Twitter
View my LinkedIn Profile

On 2011-05-13, at 6:36 PM, David Singer wrote:

> I'm going to try to clear up some of my own confusion here.
> I think we might need three pieces of information linked to a media (video or audio) element:
> * a short text (kinda like alt)
> * a long description
> * a transcript
> in all cases, they should provide equivalents for what someone who can consume the media 'normally' would pick up.  (I think this is as true of audio as of video, by the way).
> So, I was sort of right and sort-of wrong when I said that the short-text should not describe the poster, but the media.  I'm right, the element is more than the first frame or poster frame.  I'm wrong, in that the (jn this case sighted) normal user would have gathered something from that initial frame.
> so, not good:
> <video poster="TheLeopard.jpg" short-text="A movie poster for The Leopard" src="..." />
> because the sighted user will know it's a video element and that it's offering them the trailer.  
> Way better is to relay some of the information from the poster:
> <video poster="TheLeopard.jpg" short-text="Trailer for The Leopard, starring Burt Lancaster" src="..." />
> the long description can provide a more narrative version of the trailer, and the transcript a full transcript.  This way the short text is enabling the non-sighted user just like the sighted one:
> sighted: see poster, decide it's interesting, watch trailer
> non-sighted: get the short-text, decide it's interesting, read the long description and/or transcript
> (I'm using non-sighted as a shorthand for someone who, for whatever reason, can't see the video - their eyes are busy elsewhere, their UA is unable to play it, and so on.  Hope that's OK).
> (I changed from Clockwork Orange because I didn't want to write anything about that great but disturbing movie).
> <435905.1020.A.jpeg>
> David Singer
> Multimedia and Software Standards, Apple Inc.

Received on Saturday, 14 May 2011 05:33:23 UTC