Re: [media] alt technologies for paused video (and using ARIA)

Hi Steve,

I'm curious: Are you saying that the title attribute is not a useful
attribute at all because it doesn't work on all devices?

Cheers,
Silvia.


On Sat, May 14, 2011 at 2:29 AM, Steve Faulkner
<faulkner.steve@gmail.com> wrote:
> Hi all,
>
> Use of the title attribute is not appropriate for all users as it's not input device independent content.
>
> The video element is the player it is not the video itself.
>
> Labeling the video element does not equate to providing a text alternative for the content whether it's the static poster or the video being played.
>
> Regards
> Stevef
>
> Sent from my iPhone
>
> On 13 May 2011, at 01:16, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>
>> On Fri, May 13, 2011 at 3:31 AM, Jared Smith <jared@webaim.org> wrote:
>>> On Thu, May 12, 2011 at 12:35 AM, Silvia Pfeiffer
>>> <silviapfeiffer1@gmail.com> wrote:
>>>
>>>> Because if you have a black image sitting there the text alternative
>>>> surely shouldn't say that it's an Apollo launch. That would be the
>>>> wrong description of that image.
>>>
>>> Exactly. We're not concerned about letting the user know that there's
>>> a black image there. We're concerned about letting them know that it's
>>> a video about the Apollo launch... which is what sighted users can
>>> easily surmise from the visual presentation and context of that video.
>>
>> I don't follow. Let's assume a Web page that has only a video element
>> on it. Nothing else. That video element is black because nobody has
>> bothered to put a different image on it. Thus, the sighted user
>> doesn't know what's behind it. Why should the blind user be told what
>> the video will play? That's not a "text alternative" to the visual
>> presentation. That is additional information. For a black frame, the
>> text equivalent should be "black frame", otherwise a sighted user
>> looking at that video and a blind user looking at that video receive a
>> different impression and are not able to talk to each other about it.
>>
>> Once the video is playing, the sighted user can indeed surmise the
>> content from the visual presentation, and the blind user from the
>> audio description or transcription. To give the text equivalent to the
>> content of the timed visual presentation as text when the placeholder
>> frame is shown would not be providing equivalence to the visuals.
>>
>>
>>
>>>>>> Certainly a short alternative presenting the
>>>>>> content of what the video is would be useful for accessibility for
>>>>>> screen reader users (sighted users can, after all, use the entire
>>>>>> visual context to more likely determine the video's content).
>>>>
>>>> What is the entire visual context? If there is text given underneath,
>>>> it is also accessible to the blind user, in particular if it is
>>>> referenced through aria-describedby.
>>>> We cannot assume anything about
>>>> the rest of the page when we describe the paused video.
>>>
>>> Here's the ambiguity again. Are we describing "the paused video" (the
>>> thing that the user might play) or are we describing the poster frame
>>> (the static image that might, but often doesn't provide additional
>>> information about the thing the user might play)? Or both? In this
>>> case (a black frame), there is no poster frame content to describe,
>>> but the paused video is still there and I believe it needs a short
>>> description.
>>
>> I've tried be clear: it is the placeholder frame that we are
>> describing, independent of whether it comes from the video or from a
>> different resource.
>>
>>
>>> In one sentence you indicate that we should consider the context, but
>>> the next sentence suggests we should ignore it.
>>
>> Sorry if I was unclear: but they are two different use cases. The
>> first one where I am suggesting to consider the context is where the
>> context is available and can be pointed to through aria-describedby.
>> If no such context is available, we still need a text replacement for
>> the visual presentation which in case of a paused video can be
>> aria-label and in case of a playing video would be audio descriptions
>> and captions.
>>
>>> I believe context is
>>> vital in determining the alternative for any non-text element. In this
>>> case, if the visual context clearly presents to sighted users that the
>>> video is about the Apollo launch, I would think it important to also
>>> present this to a screen reader user.
>>
>> Yes, agreed. For this case, aria-describedby should be sufficient, no?
>>
>>
>>>> What if only
>>>> the black frame is sitting there and nothing else? Would you still
>>>> describe that as "Apollo launch"?
>>>
>>> Yes, if it is presented visually in the surrounding context that the
>>> video is about the Apollo launch.
>>
>> Let's assume no surrounding context (since that can be linked with
>> aria-describedby). Would it be right to tell the blind user he/she is
>> expecting an Apollo launch video without having to look into the
>> video, while a sighted user has to click play and find out? Let's
>> assume this is related to some game where you are shown three black
>> frame videos and have to pick one to find, e.g. the Apollo launch
>> while the other two are about cats and dogs (or something else) and
>> you win when you pick the right one. Exposing the content of the video
>> on the placeholder frame would be wrong, because it excludes people
>> with screenreaders from the game (since they would have an unfair
>> advantage). Not describing the black frames would be wrong, too,
>> because the blind user cannot discover that there are three videos
>> with black frames on the page.
>>
>> I think that if we want to provide a short summary of the video's
>> content during the time that the video is paused and no other text is
>> on the page, we'd have to find a solution both for sighted and blind
>> users (and probably for low-bandwidth users, too). For this situation,
>> @title would probably be the right approach?
>>
>>
>>> This becomes even more significant for users that are navigating by
>>> interactive elements. They would likely skip all descriptive text
>>> content and jump directly to the video - which you would have present
>>> no descriptive content until it is played. If there were multiple
>>> videos, a screen reader might read "video, video, video". This would
>>> be somewhat akin to "click here" which makes sense in its visual
>>> context but requires screen reader users to explore the context to
>>> determine what it is. And as noted before, this would bypass the WCAG
>>> SC 1.1.1 requirements for descriptive identification of time-based
>>> media. A short alternative to the video removes all these issues.
>>
>> IIUC that's exactly what @aria-label was created for.
>>
>> Let me try to come up with a mental model that describes the way I
>> look at the video element better.
>>
>> The video element is an interactive element. Therefore, when you jump
>> to the video, you need to be informed of your options. In essence,
>> that's "hit space to play/pause toggle". But a sighted person doesn't
>> just see the play/pause button - they also see the placeholder frame.
>> Basically, you can look at it as a very big button that includes the
>> placeholder frame. It's the image content of that frame plus the play
>> sign that makes the sighted user hit "play". Similar to how an
>> @aria-label describes what a button or form entry field exposes to the
>> sighted users, @aria-label should here also expose to the blind user
>> that this play button comes with an image. Given this view, I suggest
>> it makes sense to include the placeholder frame's description in
>> @aria-label.
>>
>>
>>>>>> Now consider that the poster frame (whether author defined, random, or
>>>>>> first frame) is an image of the moon, though the video is primarily
>>>>>> about the Apollo 11 launch. A short alternative of "The moon" (or
>>>>>> similar) would be an appropriate alternative for the poster frame, but
>>>>>> would provide little utility (and, in this case, false information)
>>>>>> about what the content of the video actually is.
>>>>
>>>> No, it wouldn't. The sighted user doesn't get more information either.
>>>
>>> Sure they do. They can see the entire context of the video to
>>> determine what the video is about. Now a screen reader user could read
>>> before or after (which one is a crap shoot), but with much more
>>> effort. Would we ever omit @alt on an image on a page about the Apollo
>>> mission based on the assumption that the screen reader user can figure
>>> out what it is based on its context? Of course not! Then why would we
>>> omit it for a video in the same place?
>>
>> We would use @aria-describedby to point to the text on the screen, IIUC.
>>
>>
>>>> You have to always assume there is nothing else on the page when you
>>>> define text alternatives for an element.
>>>
>>> I strongly disagree (http://webaim.org/techniques/alttext/#context).
>>> The same non-text element may have very different alternatives
>>> depending on its context. This is likely the crux of this issue.
>>>
>>> There's more to a video than what is presented visually when it's not
>>> playing - just like there's often more to alternative text than what
>>> the image looks like.
>>
>> I don't dispute this. I agree that the exact text of what you are
>> putting into a text alternative is indeed very much defined by what
>> its purpose is. I'm not trying to argue about what the text should be,
>> rather about which visual representation the text should describe.
>> Should it describe something that is actually visible at the time that
>> the page is rendered or should it provide a summary of something that
>> cannot be seen yet?
>>
>>
>>>>>> This then seems to call for up to 5 (yikes!) types of alternative:
>>>>>> 1. Short alternative for the <video>
>>>>
>>>> That's not necessary, because we have @transcription, track and other
>>>> page text for this (always assuming you mean the playing video here).
>>>
>>> @transcription and track are certainly alternatives to the playing
>>> video. But these wouldn't be available until the video is activated.
>>> Again, if page context describes what the video is going to play to
>>> sighted users, this information should also be presented to screen
>>> reader users in a short alternative.
>>
>> When there is text available, @aria-describedby would be used. Is that
>> not sufficient?
>>
>>
>>>>>> 2. Long alternative for the <video> (if necessary)
>>>>
>>>> That's what @transcription (off-page) and @aria-describedby (on-page) provide.
>>>
>>> Yep.
>>>
>>> Of note is that aria-describedby does not currently (and probably
>>> won't ever) support structured content or interactive elements. As
>>> Steve kindly informed me, it's mapped to the accdescription property
>>> which is a text string. This introduces some limitations for when long
>>> description needs to provide structured content, which lends itself to
>>> @transcription or @longdesc or... something.
>>
>> Would @aria-describedby support language markup, I wonder?
>>
>>
>>>>>> 3. Short alternative for the poster image (if necessary, when not
>>>>>> identical to #1)
>>>>
>>>> Yes, that's what my use case number one is and what I suggested @aria-label for.
>>>
>>> Except that it's currently ambiguous as to what should be described -
>>> the video or the poster frame.
>>
>> Sorry if that seemed confused. It didn't seem confused to me - I
>> always thought of it just describing the placeholder image. Maybe the
>> model of the placeholder image being part of the play button makes
>> that approach clearer.
>>
>> For description of the video, I'd suggest @title, which is then both
>> usable by sighted and non-sighted users.
>>
>>
>>>>>> 4. Long alternative for the poster image (if necessary, though I think
>>>>>> this would be somewhat rare)
>>>>
>>>> That could easily be part of the long alternative for the <video>.
>>>
>>> But you said previously that we only need short or long alternatives
>>> for the poster frame, not for the <video>. See why I'm confused? :-)
>>
>> OK, fair enough. :-)
>>
>> I follow all your 4 use cases/needs and I agree with them. I am trying
>> to figure out if we need any new attributes to satisfy them and am
>> also trying to figure out whether the existing attributes are
>> sufficient and also mean the right thing. While I am fully aware that
>> there is some relationship to the <img> case, not everything will be
>> identical. For example, we have the opportunity of @aria-label,
>> because video is an interactive element, which img isn't. I am very
>> keen to get this right without making it complicated on authors. And I
>> am also keen to get to a stage where we can give clear instructions to
>> authors on how to mark up things, which is rather hidden in the
>> existing spec.
>>
>>
>>> This really is a great discussion. I agree that the issue is primarily
>>> about terminology and explaining what should be described and how.
>>>
>>> Jared Smith
>>> WebAIM.org
>>>
>>
>> Thanks for your patience to continue this discussion!
>>
>> Cheers,
>> Silvia.
>>
>

Received on Saturday, 14 May 2011 04:01:33 UTC