Re: FW: [media] alt technologies for paused video (and using ARIA) from Silvia Pfeiffer on 2011-05-13 (public-html-a11y@w3.org from May 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 13 May 2011 10:16:22 +1000
To: Jared Smith <jared@webaim.org>
Cc: John Foliot <jfoliot@stanford.edu>, Everett Zufelt <everett@zufelt.ca>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <BANLkTinteMRKBDh7Yk+oKWnnXp5Z8v=Puw@mail.gmail.com>
On Fri, May 13, 2011 at 3:31 AM, Jared Smith <jared@webaim.org> wrote:
> On Thu, May 12, 2011 at 12:35 AM, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>
>> Because if you have a black image sitting there the text alternative
>> surely shouldn't say that it's an Apollo launch. That would be the
>> wrong description of that image.
>
> Exactly. We're not concerned about letting the user know that there's
> a black image there. We're concerned about letting them know that it's
> a video about the Apollo launch... which is what sighted users can
> easily surmise from the visual presentation and context of that video.

I don't follow. Let's assume a Web page that has only a video element
on it. Nothing else. That video element is black because nobody has
bothered to put a different image on it. Thus, the sighted user
doesn't know what's behind it. Why should the blind user be told what
the video will play? That's not a "text alternative" to the visual
presentation. That is additional information. For a black frame, the
text equivalent should be "black frame", otherwise a sighted user
looking at that video and a blind user looking at that video receive a
different impression and are not able to talk to each other about it.

Once the video is playing, the sighted user can indeed surmise the
content from the visual presentation, and the blind user from the
audio description or transcription. To give the text equivalent to the
content of the timed visual presentation as text when the placeholder
frame is shown would not be providing equivalence to the visuals.



>>>> Certainly a short alternative presenting the
>>>> content of what the video is would be useful for accessibility for
>>>> screen reader users (sighted users can, after all, use the entire
>>>> visual context to more likely determine the video's content).
>>
>> What is the entire visual context? If there is text given underneath,
>> it is also accessible to the blind user, in particular if it is
>> referenced through aria-describedby.
>> We cannot assume anything about
>> the rest of the page when we describe the paused video.
>
> Here's the ambiguity again. Are we describing "the paused video" (the
> thing that the user might play) or are we describing the poster frame
> (the static image that might, but often doesn't provide additional
> information about the thing the user might play)? Or both? In this
> case (a black frame), there is no poster frame content to describe,
> but the paused video is still there and I believe it needs a short
> description.

I've tried be clear: it is the placeholder frame that we are
describing, independent of whether it comes from the video or from a
different resource.


> In one sentence you indicate that we should consider the context, but
> the next sentence suggests we should ignore it.

Sorry if I was unclear: but they are two different use cases. The
first one where I am suggesting to consider the context is where the
context is available and can be pointed to through aria-describedby.
If no such context is available, we still need a text replacement for
the visual presentation which in case of a paused video can be
aria-label and in case of a playing video would be audio descriptions
and captions.

> I believe context is
> vital in determining the alternative for any non-text element. In this
> case, if the visual context clearly presents to sighted users that the
> video is about the Apollo launch, I would think it important to also
> present this to a screen reader user.

Yes, agreed. For this case, aria-describedby should be sufficient, no?


>> What if only
>> the black frame is sitting there and nothing else? Would you still
>> describe that as "Apollo launch"?
>
> Yes, if it is presented visually in the surrounding context that the
> video is about the Apollo launch.

Let's assume no surrounding context (since that can be linked with
aria-describedby). Would it be right to tell the blind user he/she is
expecting an Apollo launch video without having to look into the
video, while a sighted user has to click play and find out? Let's
assume this is related to some game where you are shown three black
frame videos and have to pick one to find, e.g. the Apollo launch
while the other two are about cats and dogs (or something else) and
you win when you pick the right one. Exposing the content of the video
on the placeholder frame would be wrong, because it excludes people
with screenreaders from the game (since they would have an unfair
advantage). Not describing the black frames would be wrong, too,
because the blind user cannot discover that there are three videos
with black frames on the page.

I think that if we want to provide a short summary of the video's
content during the time that the video is paused and no other text is
on the page, we'd have to find a solution both for sighted and blind
users (and probably for low-bandwidth users, too). For this situation,
@title would probably be the right approach?


> This becomes even more significant for users that are navigating by
> interactive elements. They would likely skip all descriptive text
> content and jump directly to the video - which you would have present
> no descriptive content until it is played. If there were multiple
> videos, a screen reader might read "video, video, video". This would
> be somewhat akin to "click here" which makes sense in its visual
> context but requires screen reader users to explore the context to
> determine what it is. And as noted before, this would bypass the WCAG
> SC 1.1.1 requirements for descriptive identification of time-based
> media. A short alternative to the video removes all these issues.

IIUC that's exactly what @aria-label was created for.

Let me try to come up with a mental model that describes the way I
look at the video element better.

The video element is an interactive element. Therefore, when you jump
to the video, you need to be informed of your options. In essence,
that's "hit space to play/pause toggle". But a sighted person doesn't
just see the play/pause button - they also see the placeholder frame.
Basically, you can look at it as a very big button that includes the
placeholder frame. It's the image content of that frame plus the play
sign that makes the sighted user hit "play". Similar to how an
@aria-label describes what a button or form entry field exposes to the
sighted users, @aria-label should here also expose to the blind user
that this play button comes with an image. Given this view, I suggest
it makes sense to include the placeholder frame's description in
@aria-label.


>>>> Now consider that the poster frame (whether author defined, random, or
>>>> first frame) is an image of the moon, though the video is primarily
>>>> about the Apollo 11 launch. A short alternative of "The moon" (or
>>>> similar) would be an appropriate alternative for the poster frame, but
>>>> would provide little utility (and, in this case, false information)
>>>> about what the content of the video actually is.
>>
>> No, it wouldn't. The sighted user doesn't get more information either.
>
> Sure they do. They can see the entire context of the video to
> determine what the video is about. Now a screen reader user could read
> before or after (which one is a crap shoot), but with much more
> effort. Would we ever omit @alt on an image on a page about the Apollo
> mission based on the assumption that the screen reader user can figure
> out what it is based on its context? Of course not! Then why would we
> omit it for a video in the same place?

We would use @aria-describedby to point to the text on the screen, IIUC.


>> You have to always assume there is nothing else on the page when you
>> define text alternatives for an element.
>
> I strongly disagree (http://webaim.org/techniques/alttext/#context).
> The same non-text element may have very different alternatives
> depending on its context. This is likely the crux of this issue.
>
> There's more to a video than what is presented visually when it's not
> playing - just like there's often more to alternative text than what
> the image looks like.

I don't dispute this. I agree that the exact text of what you are
putting into a text alternative is indeed very much defined by what
its purpose is. I'm not trying to argue about what the text should be,
rather about which visual representation the text should describe.
Should it describe something that is actually visible at the time that
the page is rendered or should it provide a summary of something that
cannot be seen yet?


>>>> This then seems to call for up to 5 (yikes!) types of alternative:
>>>> 1. Short alternative for the <video>
>>
>> That's not necessary, because we have @transcription, track and other
>> page text for this (always assuming you mean the playing video here).
>
> @transcription and track are certainly alternatives to the playing
> video. But these wouldn't be available until the video is activated.
> Again, if page context describes what the video is going to play to
> sighted users, this information should also be presented to screen
> reader users in a short alternative.

When there is text available, @aria-describedby would be used. Is that
not sufficient?


>>>> 2. Long alternative for the <video> (if necessary)
>>
>> That's what @transcription (off-page) and @aria-describedby (on-page) provide.
>
> Yep.
>
> Of note is that aria-describedby does not currently (and probably
> won't ever) support structured content or interactive elements. As
> Steve kindly informed me, it's mapped to the accdescription property
> which is a text string. This introduces some limitations for when long
> description needs to provide structured content, which lends itself to
> @transcription or @longdesc or... something.

Would @aria-describedby support language markup, I wonder?


>>>> 3. Short alternative for the poster image (if necessary, when not
>>>> identical to #1)
>>
>> Yes, that's what my use case number one is and what I suggested @aria-label for.
>
> Except that it's currently ambiguous as to what should be described -
> the video or the poster frame.

Sorry if that seemed confused. It didn't seem confused to me - I
always thought of it just describing the placeholder image. Maybe the
model of the placeholder image being part of the play button makes
that approach clearer.

For description of the video, I'd suggest @title, which is then both
usable by sighted and non-sighted users.


>>>> 4. Long alternative for the poster image (if necessary, though I think
>>>> this would be somewhat rare)
>>
>> That could easily be part of the long alternative for the <video>.
>
> But you said previously that we only need short or long alternatives
> for the poster frame, not for the <video>. See why I'm confused? :-)

OK, fair enough. :-)

I follow all your 4 use cases/needs and I agree with them. I am trying
to figure out if we need any new attributes to satisfy them and am
also trying to figure out whether the existing attributes are
sufficient and also mean the right thing. While I am fully aware that
there is some relationship to the <img> case, not everything will be
identical. For example, we have the opportunity of @aria-label,
because video is an interactive element, which img isn't. I am very
keen to get this right without making it complicated on authors. And I
am also keen to get to a stage where we can give clear instructions to
authors on how to mark up things, which is rather hidden in the
existing spec.


> This really is a great discussion. I agree that the issue is primarily
> about terminology and explaining what should be described and how.
>
> Jared Smith
> WebAIM.org
>

Thanks for your patience to continue this discussion!

Cheers,
Silvia.
Received on Friday, 13 May 2011 00:17:11 UTC