Re: [media] alt technologies for paused video (and using ARIA) from Steve Faulkner on 2011-05-13 (public-html-a11y@w3.org from May 2011)

From: Steve Faulkner <faulkner.steve@gmail.com>
Date: Fri, 13 May 2011 17:29:47 +0100
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Jared Smith <jared@webaim.org>, John Foliot <jfoliot@stanford.edu>, Everett Zufelt <everett@zufelt.ca>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-Id: <37C75DEF-E4D4-443B-A4D4-93BAA65CF3BB@gmail.com>
Hi all,

Use of the title attribute is not appropriate for all users as it's not input device independent content.

The video element is the player it is not the video itself.

Labeling the video element does not equate to providing a text alternative for the content whether it's the static poster or the video being played.

Regards
Stevef

Sent from my iPhone

On 13 May 2011, at 01:16, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:

> On Fri, May 13, 2011 at 3:31 AM, Jared Smith <jared@webaim.org> wrote:
>> On Thu, May 12, 2011 at 12:35 AM, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>> 
>>> Because if you have a black image sitting there the text alternative
>>> surely shouldn't say that it's an Apollo launch. That would be the
>>> wrong description of that image.
>> 
>> Exactly. We're not concerned about letting the user know that there's
>> a black image there. We're concerned about letting them know that it's
>> a video about the Apollo launch... which is what sighted users can
>> easily surmise from the visual presentation and context of that video.
> 
> I don't follow. Let's assume a Web page that has only a video element
> on it. Nothing else. That video element is black because nobody has
> bothered to put a different image on it. Thus, the sighted user
> doesn't know what's behind it. Why should the blind user be told what
> the video will play? That's not a "text alternative" to the visual
> presentation. That is additional information. For a black frame, the
> text equivalent should be "black frame", otherwise a sighted user
> looking at that video and a blind user looking at that video receive a
> different impression and are not able to talk to each other about it.
> 
> Once the video is playing, the sighted user can indeed surmise the
> content from the visual presentation, and the blind user from the
> audio description or transcription. To give the text equivalent to the
> content of the timed visual presentation as text when the placeholder
> frame is shown would not be providing equivalence to the visuals.
> 
> 
> 
>>>>> Certainly a short alternative presenting the
>>>>> content of what the video is would be useful for accessibility for
>>>>> screen reader users (sighted users can, after all, use the entire
>>>>> visual context to more likely determine the video's content).
>>> 
>>> What is the entire visual context? If there is text given underneath,
>>> it is also accessible to the blind user, in particular if it is
>>> referenced through aria-describedby.
>>> We cannot assume anything about
>>> the rest of the page when we describe the paused video.
>> 
>> Here's the ambiguity again. Are we describing "the paused video" (the
>> thing that the user might play) or are we describing the poster frame
>> (the static image that might, but often doesn't provide additional
>> information about the thing the user might play)? Or both? In this
>> case (a black frame), there is no poster frame content to describe,
>> but the paused video is still there and I believe it needs a short
>> description.
> 
> I've tried be clear: it is the placeholder frame that we are
> describing, independent of whether it comes from the video or from a
> different resource.
> 
> 
>> In one sentence you indicate that we should consider the context, but
>> the next sentence suggests we should ignore it.
> 
> Sorry if I was unclear: but they are two different use cases. The
> first one where I am suggesting to consider the context is where the
> context is available and can be pointed to through aria-describedby.
> If no such context is available, we still need a text replacement for
> the visual presentation which in case of a paused video can be
> aria-label and in case of a playing video would be audio descriptions
> and captions.
> 
>> I believe context is
>> vital in determining the alternative for any non-text element. In this
>> case, if the visual context clearly presents to sighted users that the
>> video is about the Apollo launch, I would think it important to also
>> present this to a screen reader user.
> 
> Yes, agreed. For this case, aria-describedby should be sufficient, no?
> 
> 
>>> What if only
>>> the black frame is sitting there and nothing else? Would you still
>>> describe that as "Apollo launch"?
>> 
>> Yes, if it is presented visually in the surrounding context that the
>> video is about the Apollo launch.
> 
> Let's assume no surrounding context (since that can be linked with
> aria-describedby). Would it be right to tell the blind user he/she is
> expecting an Apollo launch video without having to look into the
> video, while a sighted user has to click play and find out? Let's
> assume this is related to some game where you are shown three black
> frame videos and have to pick one to find, e.g. the Apollo launch
> while the other two are about cats and dogs (or something else) and
> you win when you pick the right one. Exposing the content of the video
> on the placeholder frame would be wrong, because it excludes people
> with screenreaders from the game (since they would have an unfair
> advantage). Not describing the black frames would be wrong, too,
> because the blind user cannot discover that there are three videos
> with black frames on the page.
> 
> I think that if we want to provide a short summary of the video's
> content during the time that the video is paused and no other text is
> on the page, we'd have to find a solution both for sighted and blind
> users (and probably for low-bandwidth users, too). For this situation,
> @title would probably be the right approach?
> 
> 
>> This becomes even more significant for users that are navigating by
>> interactive elements. They would likely skip all descriptive text
>> content and jump directly to the video - which you would have present
>> no descriptive content until it is played. If there were multiple
>> videos, a screen reader might read "video, video, video". This would
>> be somewhat akin to "click here" which makes sense in its visual
>> context but requires screen reader users to explore the context to
>> determine what it is. And as noted before, this would bypass the WCAG
>> SC 1.1.1 requirements for descriptive identification of time-based
>> media. A short alternative to the video removes all these issues.
> 
> IIUC that's exactly what @aria-label was created for.
> 
> Let me try to come up with a mental model that describes the way I
> look at the video element better.
> 
> The video element is an interactive element. Therefore, when you jump
> to the video, you need to be informed of your options. In essence,
> that's "hit space to play/pause toggle". But a sighted person doesn't
> just see the play/pause button - they also see the placeholder frame.
> Basically, you can look at it as a very big button that includes the
> placeholder frame. It's the image content of that frame plus the play
> sign that makes the sighted user hit "play". Similar to how an
> @aria-label describes what a button or form entry field exposes to the
> sighted users, @aria-label should here also expose to the blind user
> that this play button comes with an image. Given this view, I suggest
> it makes sense to include the placeholder frame's description in
> @aria-label.
> 
> 
>>>>> Now consider that the poster frame (whether author defined, random, or
>>>>> first frame) is an image of the moon, though the video is primarily
>>>>> about the Apollo 11 launch. A short alternative of "The moon" (or
>>>>> similar) would be an appropriate alternative for the poster frame, but
>>>>> would provide little utility (and, in this case, false information)
>>>>> about what the content of the video actually is.
>>> 
>>> No, it wouldn't. The sighted user doesn't get more information either.
>> 
>> Sure they do. They can see the entire context of the video to
>> determine what the video is about. Now a screen reader user could read
>> before or after (which one is a crap shoot), but with much more
>> effort. Would we ever omit @alt on an image on a page about the Apollo
>> mission based on the assumption that the screen reader user can figure
>> out what it is based on its context? Of course not! Then why would we
>> omit it for a video in the same place?
> 
> We would use @aria-describedby to point to the text on the screen, IIUC.
> 
> 
>>> You have to always assume there is nothing else on the page when you
>>> define text alternatives for an element.
>> 
>> I strongly disagree (http://webaim.org/techniques/alttext/#context).
>> The same non-text element may have very different alternatives
>> depending on its context. This is likely the crux of this issue.
>> 
>> There's more to a video than what is presented visually when it's not
>> playing - just like there's often more to alternative text than what
>> the image looks like.
> 
> I don't dispute this. I agree that the exact text of what you are
> putting into a text alternative is indeed very much defined by what
> its purpose is. I'm not trying to argue about what the text should be,
> rather about which visual representation the text should describe.
> Should it describe something that is actually visible at the time that
> the page is rendered or should it provide a summary of something that
> cannot be seen yet?
> 
> 
>>>>> This then seems to call for up to 5 (yikes!) types of alternative:
>>>>> 1. Short alternative for the <video>
>>> 
>>> That's not necessary, because we have @transcription, track and other
>>> page text for this (always assuming you mean the playing video here).
>> 
>> @transcription and track are certainly alternatives to the playing
>> video. But these wouldn't be available until the video is activated.
>> Again, if page context describes what the video is going to play to
>> sighted users, this information should also be presented to screen
>> reader users in a short alternative.
> 
> When there is text available, @aria-describedby would be used. Is that
> not sufficient?
> 
> 
>>>>> 2. Long alternative for the <video> (if necessary)
>>> 
>>> That's what @transcription (off-page) and @aria-describedby (on-page) provide.
>> 
>> Yep.
>> 
>> Of note is that aria-describedby does not currently (and probably
>> won't ever) support structured content or interactive elements. As
>> Steve kindly informed me, it's mapped to the accdescription property
>> which is a text string. This introduces some limitations for when long
>> description needs to provide structured content, which lends itself to
>> @transcription or @longdesc or... something.
> 
> Would @aria-describedby support language markup, I wonder?
> 
> 
>>>>> 3. Short alternative for the poster image (if necessary, when not
>>>>> identical to #1)
>>> 
>>> Yes, that's what my use case number one is and what I suggested @aria-label for.
>> 
>> Except that it's currently ambiguous as to what should be described -
>> the video or the poster frame.
> 
> Sorry if that seemed confused. It didn't seem confused to me - I
> always thought of it just describing the placeholder image. Maybe the
> model of the placeholder image being part of the play button makes
> that approach clearer.
> 
> For description of the video, I'd suggest @title, which is then both
> usable by sighted and non-sighted users.
> 
> 
>>>>> 4. Long alternative for the poster image (if necessary, though I think
>>>>> this would be somewhat rare)
>>> 
>>> That could easily be part of the long alternative for the <video>.
>> 
>> But you said previously that we only need short or long alternatives
>> for the poster frame, not for the <video>. See why I'm confused? :-)
> 
> OK, fair enough. :-)
> 
> I follow all your 4 use cases/needs and I agree with them. I am trying
> to figure out if we need any new attributes to satisfy them and am
> also trying to figure out whether the existing attributes are
> sufficient and also mean the right thing. While I am fully aware that
> there is some relationship to the <img> case, not everything will be
> identical. For example, we have the opportunity of @aria-label,
> because video is an interactive element, which img isn't. I am very
> keen to get this right without making it complicated on authors. And I
> am also keen to get to a stage where we can give clear instructions to
> authors on how to mark up things, which is rather hidden in the
> existing spec.
> 
> 
>> This really is a great discussion. I agree that the issue is primarily
>> about terminology and explaining what should be described and how.
>> 
>> Jared Smith
>> WebAIM.org
>> 
> 
> Thanks for your patience to continue this discussion!
> 
> Cheers,
> Silvia.
>
Received on Friday, 13 May 2011 16:35:53 UTC