Re: Accessibility of <audio> and <video>

This is a reply to about 100 e-mails on the topic of accessibility of 
media elements in HTML. I have only explicitly quoted e-mails below when 
they introduced new ideas; many of the e-mails merely re-emphasised points 
made earlier in the debate.

Before I jump in to specific feedback, I would like to put forward the 
vision that I had in mind when first designing the semantics and the API 
for <video> and <audio>, with regard to accessibility.

Fundamentally, I consider <video> and <audio> to be simply windows onto 
pre-existing content, much like <iframe>, but for media data instead of 
for "pages" or document data. Just as with <iframe>s, the principle I had 
in mind is that it should make sense for the user to take the content of 
the element and view it independent of its hosting page. You should be 
able to save the remote file locally and open it in a media player and you 
should be able to write a new page with a different media player 
interface, without losing any key aspect of the media. In particular, any 
accessibility features must not be lost when doing this. For example, if 
the video has subtitles or PiP hand language signing, or multiple audio 
tracks, or a transcript, or lyrics, or metadata, _all_ of this data should 
survive even if the video file is saved locally without the embedding 
page.

It turns out that this is actually not a huge problem -- video formats 
already have to deal with this. If you buy video from iTunes, all the 
metadata is within the file. If you buy audio from Amazon, all the 
metadata is within the file. If you transfer a video track from a DVD to a 
hard disk, one MPEG file can contain all the subtitles, video, and audio 
tracks. Furthermore, unlike images, videos tend to stand alone -- where an 
image can mean different things in different contexts, video files tend to 
mean the same thing in all contexts, so the accessibility alternatives 
aren't situational like image alternative text.

Thus, a fundamental principle of how this feature was designed is that any 
accessibility features and metadata features must be within the video or 
audio resource, and not in the HTML markup. The hypothesis is that this 
results in the optimal experience for all users.



On Mon, 25 Aug 2008, Justin James wrote:
> 
> I have become very concerned over the last few weeks regarding the 
> accessibility issues in HTML 5. I think that it is great that so many 
> people are pushing hard for accessibility around images, which is of 
> particular benefit for vision impaired users. However, I fear that 
> another group of users is being ignored: hearing impaired users. I find 
> it a bit strange that we are spending a huge amount of energy around the 
> img/@alt issue, but no one has brought up something like @alt for the 
> audio and video tags.
> 
> I am of the opinion that in the spirit of fairness and equality, any 
> decision made regarding @alt must also be applied, as appropriate, to 
> image and video elements as well.
> 
> The current draft, as I read it, does not support any type of 
> @transcript or @alt type of attribute on either of these tags, making 
> them both inaccessible to hearing impaired users, and video inaccessible 
> to vision impaired users as well.

I hope that the above introduction now explains the reason for this. It's 
not that hearing impaired users (and many other users with special needs) 
are excluded, but that to best serve them we should address their needs in 
the media resources themselves.


On Mon, 25 Aug 2008, Edward O'Connor wrote:
> >
> > I find it a bit strange that we are spending a huge amount of energy 
> > around the img/@alt issue, but no one has brought up something like 
> > @alt for the audio and video tags.
> 
> This is because, for legacy reasons, <img> is an empty element, and so 
> its text equivalent lives in an attribute, whereas <audio> and <video> 
> both support rich fallback via their content.

This isn't the intent. The fallback is only intended for legacy UAs. I 
would expect all HTML5 UAs, including ATs, to support <video> and <audio>, 
exposing alternative tracks, subtitles, etc.


On Mon, 25 Aug 2008, Philip TAYLOR wrote:
> Anne van Kesteren wrote:
> 
> > Actually, I think the idea is that the content stream itself is 
> > accessible.
> 
> That is not something that the HTML specification can (or should) 
> mandate.

Indeed, we are somewhat at the mercy of whatever codec and container 
formats we end up picking. We should definitely consider accessibility a 
high priority when selecting a good codec, though. For example, we 
shouldn't pick a codec that requires that subtitles be burnt in.


> > The contents of the <audio> and <video> element are a) for <source> 
> > elements and b) for user agents not supporting <audio> and <video>.
> 
> But if someone who were unable to hear audio, or see video, were to 
> configure his/her browser so as not to render such streams, then such a 
> configuration would surely be indistinguishable from "user agents not 
> supporting <audio> and <video>", and thus the content would be rendered, 
> would it not ?

Such users would presumably enable subtitles, or enable audio tracks for 
the visually impaired, rather than disable the media altogether.


On Mon, 25 Aug 2008, Boris Zbarsky wrote:
>
> So if the UA doesn't support the particular codec but does support 
> <video> it should not show the fallback content?

Correct.


On Mon, 25 Aug 2008, Boris Zbarsky wrote:
> 
> That seems entirely unreasonable to me.  In particular, it doesn't let 
> me, as an author, try for a higher-quality codec with fallback to a more 
> commonly supported one.

That's what <source> is for. Since the idea is that we will have a codec 
that is supported everywhere, that is what the final fallback would be.


On Mon, 25 Aug 2008, Boris Zbarsky wrote:
> 
> Hmm.  OK.  So I can fall back to a different codec using multiple 
> <source>s. I can't fall back to a video format that only has plug-in 
> support, though?

If you wish to try providing video encoded using a non-ubiquitous codec 
using <video>, and, if that fails, switch to a rendering using an NPAPI 
plugin, you can use script -- however, that isn't expected to be a common 
case (at least on the long term; it might be a common case in the 
transition period where we don't have a uniform codec, but we shouldn't 
optimise for that case).


On Mon, 25 Aug 2008, Philip TAYLOR wrote:
> 
> Scenario : the Principal addresses the University and his address is 
> recorded; his aide asks the webmaster to put the video up on the web.  
> The webmaster looks at the video and finds there are no closed-captions, 
> no subtitles, no accessibility features at all.  What is he to do ?  
> Refuse to put it up.  That would be a brave webmaster indeed.  No, 
> instead he puts it up, then relies on the intelligent design of HTML 5 
> to allow him to add accessibility features to overcome the deficiencies 
> of the raw material. And it is our responsibility to make sure that he 
> can do this.

The idea is that he would put those accessibility features in the video 
file itself, rather than just in the HTML. That way they don't get lost 
when the user moves the file around (e.g. to use their accessibility- 
optimised video player).


On Mon, 25 Aug 2008, Philip TAYLOR wrote:
> 
> OK, that's a constructive suggestion.  But it also seems to be an 
> attempt to hive off the responsibility for providing accessibility 
> features on others (in this case, the designers of SMIL).  Why don't we 
> just bite the bullet and make HTML 5 accessible, full stop ?

We should look at the Web platform as a whole as a single entity. HTML5 is 
just one small part of that. Given that outlook, it doesn't really matter 
which committee designs the accessibility features, so long as they are 
there. What matters is what is the best technical solution for them. The 
best technical solution here is to have the accessibility features as 
closely associated with the moving pixels as possible, in the video file 
itself.

One could equally ask the question "Why don't we just bite the bullet and 
make HTML 5 define the entire video format". Features like subtitles are 
no more or less important than features like the actual moving pixels of 
the dog on the skateboard -- they are all part and parcel of what the 
video is, and should be together.


On Mon, 1 Sep 2008, Justin James wrote:
> 
> I can imagine the furor if we also applied this logic to images, by 
> saying, "if you want accessible images, use a format that natively 
> supports metadata of alternate text, or put a 
> subtitle/caption/legend/etc. in your image." Heads would roll. On this 
> other hand, I also agree that it is not HTML's responsibility to 
> pre-determine every possible accessibility scenario for every possible 
> type of content and account for it.

The big difference with images is that they are often situational (their 
meaning changes based on where they are placed), which is much more rarely 
the case with videos and audio tracks.

(Also, frankly, accessibility of images is a much less well-understood 
problem than accessibility of audiovisual entertainment.)


> A middle ground that I would like to propose, would be for @alt to be 
> allowed on any type of non-text element, as well as @longalt (or 
> @longdesc), and a @longalturl attribute, which would allow for an URL to 
> be given for a FULL textual representation. For example, with an image 
> respresenting a math formula:
> 
> <img src="formula.gif" alt="The Fibonacci Sequence" longalt="The 
> Fibonacci Sequence expressed as a mathematical formula." 
> longalturl="http://www.domain.com/fibonacci_text.html">
> 
> Does this make sense? Would this meet the needs for providing 
> accessibility metadata on non-img elements, while not getting in the way 
> of providing multimedia content?

I think this is the wrong approach -- we shouldn't be emphasising the 
separation between parts of the population, we should be working to unify 
everyone. It's much better to have computers use the same input (e.g. 
MathML for this example) and adapt the output to the user than to make 
certain users be effectively "second class citizens" that have to be 
treated specially.

Getting back to video and audio, transcripts and lyrics are a good example 
of this. Instead of just providing transcripts to users who can't make 
full use of the video or audio, it is better to provide them to everyone. 
Many users who are otherwise quite capable of looking at video and audio 
may desire transcripts.


On Mon, 1 Sep 2008, Lachlan Hunt wrote:
> 
> OK, let's look at the various kind of alternatives that could be potentially
> provided for people who either can't watch or don't want to watch the video.
> 
> * Transcript of spoken content.
> * Textual descriptions of relavant non-spoken content.
>   e.g. descriptions of significant actions or sounds in the video.
> * Still images illustrating significant moments from the video.
>   e.g. images of presentation slides, if the video was of someone giving
>   a presentation.
> * A link to download the video, possibly in alternative formats, to
>   watch in an external media player, perhaps in several formats.
> * Video embedded in the page using an alternative method
>   e.g. Flash, the Cortado Theora applet, or whatever
> 
> Any or all of those could be provided and all of them would be suitable for
> people in the following categories:
> 
> 1. People using browsers that don't support <video>
> 2. People using browsers that do support video, but don't support the
>    codecs used.
> 3. People who can't see the video well (blind or visually impaired)
> 4. People who can't hear the video well (hearing impared or no
>    headphones/speakers available)
> 5. People who want to search the content of the video
>    e.g. Instead of seeking through the video to find the information
>    they want, just look through the transcript, from which they can also
>    copy quotes if they want.
> 
> Given that the alternative content is useful to so many people, 
> regardless of physical disability or technological limitations, it makes 
> sense for it to be provided in a way that makes it available to 
> everyone.  This is one reason why hiding alternative content away within 
> the video element is not helpful because it only makes it readily 
> available to a small subset of those people who might want or need it.
> 
> Another reason to consider is that we know very well what authors do 
> when they are encouraged to hide alternative content specifically for 
> accessibility within elements.  For example:
> 
> <noframes>Your browser does not support frames, Please upgrade to 
> Netscape 4</noframes>
> 
> <noscript>Sorry, your browser doesn't support javascript.</noscript>
> 
> <object ...>Please install flash</object>
> 
> It might work ok for <video> during the transitional period when not all 
> browsers implement it, since authors will gain practical benefits by 
> including alternative embedding mechanisms within the video element. 
> e.g.
> 
> <video src="movie"><object data="movie"></object></video>
> <p><a href="movie">Download movie</a>
>    <a href="transcript">Read transcript</a> ...
> 
> But, in the long term, after browsers with native video support are more 
> widely used, we'll likely start seeing people leaving the video element 
> empty or including useless messages, and accessibility loses.  Whereas 
> by encouraging people to use visible alternative content, everybody 
> wins.

Indeed.


On Tue, 2 Sep 2008, Leif Halvard Silli wrote:
> 
> This has me asking: Why couldn't the poster image just be a regular 
> <img> element, rather than an attribute?

It's not clear to me how it would work as an <img> element. An attribute 
is simpler.


> So, take that poster image again. If instead of building the poster 
> image into the <video> element, one cold design the <video> element so 
> that one may use the <img> element in order to display a poster image, 
> then there would be no problem with fallback for the poster image, as 
> the <img> would provide that fallback.

Given that most poster frames are autogenerated, and thus don't have 
useful alternative text, it's not clear to me what this gives us.


On Tue, 2 Sep 2008, Matthew Raymond wrote:
> 
> Perhaps we can provide some attribute values for |rel| along these lines:
> 
>  * "transcript"
>  * "slideshow"
>  * "download"
> 
> We could then associate such content using the <figure> element:
> 
> | <figure id="figVideo1">
> |   <video [...]> [...] </video>
> |   <legend>
> |     <a href="..." rel="transcript">
> |       Click here for a transcript.
> |     </a><br>
> |     <a href="..." rel="download">Download Quicktime version.</a><br>
> |     <a href="..." rel="download">Download Ogg Theora version.</a>
> |   </legend>
> | </figure>
> 
> UAs could then be given the latitude to present this however they wish, 
> while legacy UAs would just provide the fallback for the video plus a 
> series of links. Plus, if UAs choose not to provide any special handling 
> of the content, at the very least it's still accessible to everyone.

I encourage people to register these rel="" values in the wiki and to try 
them. I am very interested in what experience with this teaches us. If it 
turns out to be a good idea, it's definitely something we could add to the 
spec.


On Tue, 2 Sep 2008, Smylers wrote:
> 
> If people start wanting to use videos for logos, decoration, mere 
> illustration, text replacement, as icons, or whatever then they would 
> need <img>-like alt text -- and we'd have the same thing as with images, 
> where a single image could serve different purposes (and as such require 
> different alt text) on different pages.
> 
> But there doesn't seem to be a desire for such use of videos -- they all 
> seem to be in the category of being 'important content' on the page -- 
> so, as Lachlan suggests, alternative representations could be embedded 
> in the video and still be appropriate.

Agreed.

(We can use animated GIFs and Flash as a surrogate for the demand here. I 
don't think we see them used that much for those roles.)


> Each video's title, or other information which helps pick between them, 
> obviously _could_ be included in the HTML next to the video.  But this 
> may be of no benefit to sighted users.  Consider a page with videos of 
> speeches, with the poster frames containing head-shots of different 
> presidential candidates, each with a visible caption of their name and 
> party: putting their this information additionaly on the page with HTML 
> would be repeating it visually; that would be unnecessary for sighted 
> viewers, and I'm not sure it's reasonable to insist that authors should 
> include this duplication for accessibility reasons.
> 
> I'd've thought it better that there's some way in which non-image 
> alternative to the poster frame could be made available for speaking 
> browsers.

This is true; in general, however, the title could be obtained from the 
header of the video files, which would be more likely to be provided than 
metadata in the page. I agree that this would become unwieldy in the face 
of many videos, but I'm not sure an attribute would be better at that 
point. I would encourage authors to use <figure> and <legend>, potentially 
hiding the <legend> element from visual users.


On Wed, 3 Sep 2008, Smylers wrote:
> 
> Ah.  I'd presumed that <video> is intended to cope with all desires for 
> embedding videos in webpages; are there some situations in which <video> 
> is inappropriate and a different HTML 5 element should be used?

I don't believe so. <video> is intended for all current uses of video on 
the Web.


On Wed, 3 Sep 2008, Dave Singer wrote:
> 
> We've actually been thinking about the framework for accessibility of 
> media elements in HTML5.  Note that this is rather different from 
> discussing (say) caption formats or the like.  I've attached a 'thought 
> piece' on the subject, which attempts to lay out some of the needs as we 
> see them, and also proposes a way ahead.
>
> http://lists.w3.org/Archives/Public/public-html/2008Sep/att-0118/html5-media-accedssibility.html

In general I agree with this document, though, for the reasons described 
above, I do not agree with the conclusions regarding providing alt, 
longdesc, or other fallback inside the HTML file itself.

How has work regarding media queries on this topic progressed?

Others argued that media queries weren't a suitable technology; should we 
instead provide this information in an attribute or two? (It would be easy 
to do so, provided we could decide on a set of axes. I am not qualified to 
know what the right axes are, but given a set, and guidance on likely 
extension directions, I would be happy to provide syntax for this.)


On Wed, 10 Sep 2008, Lachlan Hunt wrote:
> 
> http://wiki.whatwg.org/wiki/Video_accessibility#Selection_Mechanisms

This is very useful, thanks.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 14 October 2008 00:56:36 UTC