RE: [media] alt technologies for paused video (and using ARIA) from Léonie Watson on 2011-05-11 (public-html-a11y@w3.org from May 2011)

From: Léonie Watson <lwatson@nomensa.com>
Date: Wed, 11 May 2011 23:20:34 +0100
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, John Foliot <jfoliot@stanford.edu>
CC: HTML Accessibility Task Force <public-html-a11y@w3.org>, James Craig <jcraig@apple.com>, Michael Cooper <cooper@w3.org>
Message-ID: <D4219A0ECCAE794C9ED7DC6F5A4C0CD537B34C4DC9@jupiter.intranet.nomensa.com>
John Foliot wrote:
"
> When a screen reader (for example) announces aloud "A Clockwork Orange
> movie poster" it is labeling something completely different than the
> movie; it is inappropriate and confusing to suggest otherwise and
> contrary to what aria-label has been defined to express."

Silvia Pfeiffer replied:
"It's a label for the video element, which in the instance of non-autoplay is simply the content of the placeholder frame. So, it's completely correct."


Silvia,

        I think this approach looks at it from the wrong perspective. Having a label for the video element may be technically correct, but I don't think it reflects the user perspective all that well.

        When I arrive at a video (with my screen reader), I want to know what that static image/frame contains. At that moment in time, in the world according to me and my screen reader, that image exists entirely in its own right. It might be a still from the video, it might be a separate image. It might be related content, it might be a completely unrelated corporate ident (for example).


        Wanting to know what that image contains doesn't prevent me from wanting to know what the video contains. There may well be overlap, but equally they could be worlds apart.

        Technically speaking, the image may well be part of the video element. This is a technical specification of course, but it's also the tool we'll use to build user experiences, and I don't think this composite approach supports that goal.


Regards,
Léonie.

--
Nomensa - humanising technology

Léonie Watson, Director of Accessibility & Web Development

tel: +44 (0)117 929 7333
twitter: @we_are_Nomensa

Nomensa Email Disclaimer: http://www.nomensa.com/email-disclaimer

© Nomensa Ltd, King William House, 13 Queen Square, Bristol BS1 4NT UK VAT registration: GB 771727411 | Company number: 4214477

-----Original Message-----
From: public-html-a11y-request@w3.org [mailto:public-html-a11y-request@w3.org] On Behalf Of Silvia Pfeiffer
Sent: 11 May 2011 14:50
To: John Foliot
Cc: HTML Accessibility Task Force; James Craig; Michael Cooper
Subject: Re: [media] alt technologies for paused video (and using ARIA)

Hi John,

On Wed, May 11, 2011 at 12:56 PM, John Foliot <jfoliot@stanford.edu> wrote:
> Silvia Pfeiffer wrote:
>>
>> Hi all,
>
> Hi Silvia, thanks for bringing this topic to the fore. I have copied
> the Chairs of the ARIA WG on this response for their info and possible
> input concerning ARIA usage.
>
>
>>
>> Over the last weeks I've been putting together ideas about what
>> requirements we have for alt technologies on videos that are either
>> paused by default or not displayed because of text-only displays.
>>
>>
>> My current state of mind is that we need to solve three use cases:
>>
>> 1. a brief description that will give the casual "tab"-passer-by an
>> impression as to what the video is about to help them make a
>> play/noplay decision
>>
>> 2. longer descriptions that give a bit more detail and describe, e.g.
>> the poster and give a summary of the content; this is often text
>> already available elsewhere on the page
>>
>> 3. a possibility to link a full transcription of the video to the
>> video and provide it in the context menu
>
> One potential use case not captured here is the case where we have the
> 'better' structural navigation we've talked about (but not yet
> spec'd), such as 'chapters' and/or sub-chapters that users could skip
> to - each of those 'chapter points' could/would have a default 'still'
> that we should address as well. We discussed this very briefly at the
> face-to-face in March.


That is actually time-aligned information and already solved with the type "chapters" in the TextTrack API. Naomi from Google is actually giving a demo at Google I/O about this this week. It is not the target of this discussion, so any feature changes/addition to  "chapters"
should be discussed in a different thread. I don't want to side track this discussion here. There is already enough to discuss here.


>> I've concretely suggested to introduce the following attributes on
>> <video>:
>>
>> 1. To satisfy use case 1: @aria-label
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_Alt_Technologies#Example_1:_
> A_Cl
> ockwork_Orange)
>
>  <video poster="media/ClockworkOrangetrailer.jpg" controls
>         aria-label="A Clockwork Orange movie poster">
>    <source src="media/ClockworkOrangetrailer.mp4">
>    <source src="media/ClockworkOrangetrailer.webm">
>    <source src="media/ClockworkOrangetrailer.ogv">
>  </video>
>
> RESPONSE:
> This is a mistaken use of aria-label: this <video> (object) is not a
> poster, it is the entire media offering - a multi-media resource that
> deaf users, blind users, and deaf/blind users will consume differently
> based upon the additional resources that the author provides.

I have come to this text by discussion with several people, including several blind developers of screen readers, so I don't think this is a mistaken use of aria-label. In fact, my examples actually had longer text in @aria-label and I was told to make them shorter because blind users don't want to have to wait until the end of reading-out of the label before being told additional information about the element.

Note how my proposal clearly states that this text is only relevant when the video is not on autoplay. This is because in this situation the video is represented by the placeholder image, which in this case is the Clockwork Orange movie poster. When the video plays, the alt text is not relevant because we have an audio track playing and audio descriptions. So, when autoplay is turned off and a screenreader moves onto this element, the screenreader needs to share with the blind user exactly what the sighted users are seeing, which is the placeholder image.

Also, note how I am deliberately not talking about a poster frame, because we are not providing accessibility information to the user about the markup, but about the rendering. Since there is no difference to the sighted user when looking at the video element whether the frame is extracted from the video file or from a separate resource, this is not something that a blind user needs to be told about either.

I have indeed forgotten to add a third example (thanks for reminding
me) where I explain what is being marked up when there is no @poster attribute in the video element. I knew I'd forget one of your use cases. :-) I've now added a third example that shows what is marked up with and without the @poster attribute, so you can see the difference.


> The ARIA specification defines aria-label this way:
>        aria-label
>        "Defines a string value that labels the __current element__.
> (Emphasis mine - JF) See related aria-labelledby.

Yes, and that's exactly what it is doing: a label that is being read out for the video element.


> The purpose of aria-label is the same as that of aria-labelledby. It
> provides the user with a recognizable name of the object. The most
> common accessibility API mapping for a label is the accessible name property."
> http://www.w3.org/TR/wai-aria/states_and_properties#aria-label
>
> In your code example, the element is <video> and the @poster is an
> *attribute* of the <video> element (object). I have tried numerous
> times to explain this to the sub-team, with apparently no success:
> attributes cannot take on additional attributes, this is simply how
> the mechanics of HTML works.

It is not an attribute on an attribute. We are not describing the image, but the placeholder frame for the video, which in this instance
*is* the video. See example 3 on
http://www.w3.org/WAI/PF/HTML/wiki/Media_Alt_Technologies#Example_3:_video_without_placeholder_image
for the difference.

> When a screen reader (for example) announces aloud "A Clockwork Orange
> movie poster" it is labeling something completely different than the
> movie; it is inappropriate and confusing to suggest otherwise and
> contrary to what aria-label has been defined to express.

It's a label for the video element, which in the instance of non-autoplay is simply the content of the placeholder frame. So, it's completely correct.


>> 2. To satisfy use case 1: @aria-describedby
>
> <video poster="media/ClockworkOrangetrailer.jpg" controls
> aria-describedby="summary more desc"
>        aria-label="A Clockwork Orange movie poster">
>    <source src="media/ClockworkOrangetrailer.mp4">
>    <source src="media/ClockworkOrangetrailer.webm">
>    <source src="media/ClockworkOrangetrailer.ogv">
>  </video>
>
>  <div>
>    <p id="summary">
> In future Britain, charismatic delinquent Alex DeLarge is jailed and
> volunteers for an experimental aversion therapy developed by the
> government in an effort to solve society's crime problem... but not
> all goes to plan.
>    </p>
>    <ul>
>      <li>Director: Stanley Kubrick</li>
>      <li>Writers: Stanley Kubrick (screenplay), Anthony Burgess
> (novel)</li>
>      <li>Stars: Malcolm McDowell, Patrick Magee and Warren Clarke</li>
>      <li id="more"><a
> href="http://www.imdb.com/title/tt0066921/">Details
> on IMDB</a></li>
>    </ul>
>  </div>
>
> RESPONSE:
> With regard to aria-describedby="summary more" I agree, this is good
> usage of ARIA and meets the needs of the use-case.  I had previously
> suggested that aria-describedby could meet this need:
>
>        "(NOTE: At this time, I believe that adding @alt to the video
> element is semantically weak and inappropriate: while I believe it is
> important if not critical to provide a textual summation of the actual
> video asset for accessibility considerations, attributes such as
> @aria-labelledby, @aria-describedby, or (@longdesc*) applied to
> <video>, or perhaps <summary> as a child element of <video>, would be
> more accurate and useful to non-visual users.)"
> http://www.w3.org/html/wg/wiki/ChangeProposals/PosterElement

OK.


> ******************
>
> As for aria-describedby="desc", Silvia you are being tricked by your
> eyes here (sorry).
>
> Perhaps a re-examination of the code, with the poster initially
> removed from the mix will help. (This will assume a closed system that
> only uses Safari as the browser available.):
>
> <video  <!-- establishes the element -->
>      src="media/ClockworkOrangetrailer.mp4"
>                <!-- declares an attribute of the element: @src -->
>        controls
>                <!-- declares an attribute of the element: @controls
> -->
>        aria-describedby="desc"
>                <!-- declares an attribute of the element:
> @aria-describedby -->
>></video>
>
> The question now is, with the video object src (attribute) defined as
> an
> .mp4 file, what is its description? (In other words, what are you
> describing via aria-describedby?)
>
> Is it:
>  "(Summary:) In future Britain, charismatic delinquent Alex DeLarge is
> jailed and volunteers for an experimental aversion therapy developed
> by the government in an effort to solve society's crime problem... but
> not all goes to plan."?
>
> Or is it:
>  "...a movie poster with the film's protagonist, Alex (played by
> Malcolm
> McDowell) brandishing a knife while peering through a cutout of a
> stylized "A" or inverted "V". An eyeball appears floating at his
> wrist. The poster also reads "Being the adventures of a young man
> whose principle interests are rape, ultra-violence and Beethoven", as
> well as bold psychedelic type below the image which reads "Stanley Kubrick's Clockwork Orange..."?

You can put multiple sections into @aria-describedby and the screenreader will read them all out. Thus, if we want to provide as a longer description the poster's details and a summary of the video, the way to do it is to reference both sections of text.


> ******************
>
>
> Revisiting the same code, but this time with the @poster declaration:
>
> <video  <!-- establishes the element -->
>        src="media/ClockworkOrangetrailer.mp4"
>                <!-- declares an attribute of the element: @src -->
>        controls
>                <!-- declares an attribute of the element: @controls
> -->
>        aria-describedby="desc"
>                <!-- declares an attribute of the element:
> @aria-describedby -->
>
> poster="http://www.iff2010.com/images/competitions/Film-can_details.png"
>                <!-- declares an attribute of the element: @poster -->
>></video>
>
> Here again, what are you describing? The same as the previous example?

Yes, except that the placeholder frame may have a different content, so that part has to e changed.


> But what of the imagery[*] at
> http://www.iff2010.com/images/competitions/Film-can_details.png? Why
> would the description text referenced by aria-describedby change
> significantly simply because the author chooses to also include an
> author-selected image?

Because the sighted user sees something different, too. See example 3 in the wiki page.


> [* For the benefit of some readers, I will describe the image:
> referenced is two film cans, one laying flat, the other standing on
> its edge, located just behind the laying can. Both cans are decorated
> with an image of a movie camera in the center, and ringed with a
> series of large black circles to simulate the look of a movie reel.]
>
>
> WCAG 2.0 states:
>        "Guideline 1.1 Text Alternatives: Provide text alternatives for
> any non-text content so that it can be changed into other forms people
> need, such as large print, braille, speech, symbols or simpler language."
> http://www.w3.org/TR/WCAG20/#text-equiv
>
> The .mp4 object is "non-text content".
> The .png object is "non-text content".
> They are *different* objects - equally related to the <video> elements
> as siblings, but discrete and unique non-the-less.

No, to the sighted user there is no difference if the placeholder image comes from an image file or from the video itself. Therefore, it is part of the video and needs to be described as part of it.

If at some other location on the Web page you want to create an img element with the .png in it, then go ahead and make separate descriptions for he image. But when used with the video, it is not a separate entity.


> This is not a question of whether the image chosen is appropriate or
> not, or whether authors should or shouldn't do this, as no matter what
> we suggest in authoring guidance, the fact of the matter is that the
> code demonstrated would be fully conformant and would render on
> screen. For this reason, we are obligated to ensure that both non-text
> object have a means to be textually represented. The Guideline is
> clear: *any* non-text content requires text alternatives.

Yes, and that's exactly what my examples are providing.


>> 3. To satisfy use case 1: a new attribute @transcription
>
> RESPONSE:
> This is interesting.
>
> I am curious to know why you wouldn't consider the <track + @kind>
> pattern here, as transcripts are essentially the same as captions
> minus the time-stamping information.

Because they are not time-stamped, they cannot be referenced through a <track>. A <track> is time-aligned text with the video and therefore transcripts have no place there.


> I am not overly concerned here, but more curious. Is there an
> advantage of treating the transcript as a different type of text file
> than other text files associated to the <video> element?

Well, that is a very good question. The alternative is to require the user to provide a url on the page somewhere with a link to the transcript, where that section could be hidden from view through the off-screen effect. I've included this into the element because it makes it possible to copy that link along with the video to another web page. It signifies a tighter bond. But it is indeed a question whether it should be there.


> As well, (at the risk of belaboring a point) an @transcription
> attribute to <video> re-enforces my assertion that attributes attached
> to elements define properties of the element, and not of other sibling attributes.

That is not the case for any of the attributes.


> ******************
>
>>
>>
>> Example 2: video with text
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_Alt_Technologies#Example_2:_
> vide
> o_with_text)
>
> You wrote:
> An @aria-label attribute is added with a short description which
> captures the core of the displayed video. The server makes sure to
> serve the text in the language that is in use on the Web page. If that
> language is switched, the aria-label text will also switch language.
>
> Screen readers and voice browsers would upon tabbing onto the video
> element read out the aria-label text.
>
>  <video poster="media/acessodigital.png" controls
>         aria-label="Web accessibility: cost or benefit">
>    <source src="media/acessodigital_en.mp4">
>    <source src="media/acessodigital_en.webm">
>    <source src="media/acessodigital_en.ogv">
>  </video>
>
>
> RESPONSE:
> I have concerns here that you are expecting a server environment to be
> able to detect incoming language preferences (this of course can be
> done, but is really only present on large international sites), and
> that somehow this detection will then re-write the web-page to change
> the value of the aria-label.

I am not expecting any such thing. I am only saying that if the Web page is already made available in multiple languages - which incidentally nowadays typically means that a copy of the Website is run with only slightly different page names - then aria-label content has to be translated along with all the other page content. This is, indeed, also the case for @alt on image attributes and is a reasonable thing to expect.


> I know firsthand that here on campus I cannot reasonably expect my IT
> department to provide this kind of language negotiation on the
> server(s), especially given the  sheer number of decentralized servers
> on campus. We require (I
> believe) an
> author-based solution that addresses internationalization issues.
> Directly
>
> indicating 'in the code' changes of language benefits the majority of
> screen reader users, as most tools today can change language profiles
> on the fly.

Yes, the language in use in attributes is determined by the language in use by parent elements. I'm not making up anything new here, just saying that it hooks into existing mechanisms for internationalization.


> WCAG 2.0 states:
>        "3.1.2 Language of Parts: The human language of each passage or
> phrase in the content can be programmatically determined except for
> proper names, technical terms, words of indeterminate language, and
> words or phrases that have become part of the vernacular of the
> immediately surrounding text. (Level AA)"
> http://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-other-lang-id.html
>
> ...while the associated Techniques for WCAG 2.0 states:
>        "H58: Using language attributes to identify changes in the
> human language
>
> The objective of this technique is to clearly identify any changes in
> language on a page by using the lang or xml:lang attribute, as
> appropriate for the HTML or XHTML version you use."
> http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58


Yes, that attribute indeed influences the language of the page. I'm not making up anything new for media elements.


> In the example provided, the initial key frame offers text in three
> languages, despite the fact that the source language of the document
> is clearly (as well as
> programmatically) indicated as English:
>        <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
> dir="ltr">
>
> However, every sighted user accessing the page can clearly see that
> the embedded text is offered in 3 languages, and so we are obligated
> to convey the same
>
> information to non-sighted users as well. What *should* be conveyed to
> screen reader users is essentially this:
>
>        <span lang="pt-br">Acessibilidade Web: Custo ou
> Benefício?</span> Web
> Accessibility: Cost or Benefit <span lang="es">Accissibildad Web:
> Costo ou
>
> Beneficio?</span>
>
> ...and not 1/3 of that.

Actually, what I have provided gives those who have come to the Web page expecting it to be in English more information, because it tells them what languages are used in the placeholder frame. You wouldn't write such a text visually, but you would say it. I think it's the perfect representation of that text, short and sweet. Reading it out three times in a row doesn't give users more information - it likely only confuses them. And since the main language of the Web page as you explained so nicely above is given, it is easy to represent that text in that language and just name the other two languages.


> Returning to the definition of aria-label and aria-labeledby, the
> Candidate Recommendation states that this attribute "Defines a string
> value that labels the current element."
>
> While I am not 100% certain, I believe that aria attributes that take
> 'string values' are like @alt, in that they cannot take additional
> block level or inline elements (as this would I believe create a
> nesting error - something specifically forbidden today for @alt in
> HTML5), but I will request clarification from the ARIA experts here.

Yes, @aria-label and in fact any attribute value is like that and cannot be further marked up.


> So while the second example certainly remains true to the Draft "HTML5:
> Techniques for providing useful text alternatives"
> (http://www.w3.org/TR/html-alt-techniques/#img-of-text) by focusing on
> the embedded text in that image, we need to also be sure that we can
> support internationalization at the author level by using the @lang or
> @lang-xml attributes. *IF* aria-label can support inline <span>s then
> this may be an acceptable possibility (although it still does not provide for:
>        " The placeholder images shows fingers on a keyboard titled
> 'Web
> accessibility: cost or benefit' in Spanish, English and Portuguese.")


I18n is supported in the usual way, which does not break through what I am suggesting.



> Meanwhile the code example #3 shows:
>
> <div id="posteralt" style="position:absolute; left:-10000px;
> width:1px; height:1px; overflow:hidden;">The placeholder images shows
> fingers on a keyboard titled 'Web accessibility: cost or benefit' in
> Spanish, English and Portuguese.
> </div>
>
> RESPONSE:
> This feels very "hacky" - and is extremely reminiscent of discussions
> surrounding
> @longdesc: "hidden content" that may or may not be appropriate for
> sighted users, 'discoverability', and other related head-bashing. As
> well, it returns to the question of what is being described, the (sic)
> .mp4 or the .png.

It is the recommended way for making content invisible at http://webaim.org/techniques/css/invisiblecontent/ and if anyone should know the best way it should be WebAIM IMO. I just followed their recommendation and I think in this case it is appropriate.


> While David Singer indicated he was not keen on going there, both
> Silvia and now I am interested in teasing this out further, but note
> that this rests on events currently outside of our sub-group
> surrounding the re-opening of Issue 31 @longdesc.

The attibutes here only have a loose connection to @longdesc. I think we can definitely have this discussion and make proposals outside the scope of @longdesc.


>> I would like to suggest a discussion of this proposal here on list
>> and in the next media subgroup meeting.
>
> +1

Excellent! I will go to bed and talk with you and everyone else about it tomorrow then.

Cheers,
Silvia.
Received on Wednesday, 11 May 2011 22:23:36 UTC