W3C home > Mailing lists > Public > public-html-a11y@w3.org > May 2011

Re: [media] alt technologies for paused video (and using ARIA)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 11 May 2011 23:50:05 +1000
Message-ID: <BANLkTikZmCKFuDQBZVgueZxmqaqQJFctGQ@mail.gmail.com>
To: John Foliot <jfoliot@stanford.edu>
Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>, James Craig <jcraig@apple.com>, Michael Cooper <cooper@w3.org>
Hi John,

On Wed, May 11, 2011 at 12:56 PM, John Foliot <jfoliot@stanford.edu> wrote:
> Silvia Pfeiffer wrote:
>> Hi all,
> Hi Silvia, thanks for bringing this topic to the fore. I have copied the
> Chairs of the ARIA WG on this response for their info and possible input
> concerning ARIA usage.
>> Over the last weeks I've been putting together ideas about what
>> requirements we have for alt technologies on videos that are either
>> paused by default or not displayed because of text-only displays.
>> My current state of mind is that we need to solve three use cases:
>> 1. a brief description that will give the casual "tab"-passer-by an
>> impression as to what the video is about to help them make a
>> play/noplay decision
>> 2. longer descriptions that give a bit more detail and describe, e.g.
>> the poster and give a summary of the content; this is often text
>> already available elsewhere on the page
>> 3. a possibility to link a full transcription of the video to the
>> video and provide it in the context menu
> One potential use case not captured here is the case where we have the
> 'better' structural navigation we've talked about (but not yet spec'd),
> such as 'chapters' and/or sub-chapters that users could skip to - each of
> those 'chapter points' could/would have a default 'still' that we should
> address as well. We discussed this very briefly at the face-to-face in
> March.

That is actually time-aligned information and already solved with the
type "chapters" in the TextTrack API. Naomi from Google is actually
giving a demo at Google I/O about this this week. It is not the target
of this discussion, so any feature changes/addition to  "chapters"
should be discussed in a different thread. I don't want to side track
this discussion here. There is already enough to discuss here.

>> I've concretely suggested to introduce the following attributes on
>> <video>:
>> 1. To satisfy use case 1: @aria-label
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_Alt_Technologies#Example_1:_A_Cl
> ockwork_Orange)
>  <video poster="media/ClockworkOrangetrailer.jpg" controls
>         aria-label="A Clockwork Orange movie poster">
>    <source src="media/ClockworkOrangetrailer.mp4">
>    <source src="media/ClockworkOrangetrailer.webm">
>    <source src="media/ClockworkOrangetrailer.ogv">
>  </video>
> This is a mistaken use of aria-label: this <video> (object) is not a
> poster, it is the entire media offering - a multi-media resource that deaf
> users, blind users, and deaf/blind users will consume differently based
> upon the additional resources that the author provides.

I have come to this text by discussion with several people, including
several blind developers of screen readers, so I don't think this is a
mistaken use of aria-label. In fact, my examples actually had longer
text in @aria-label and I was told to make them shorter because blind
users don't want to have to wait until the end of reading-out of the
label before being told additional information about the element.

Note how my proposal clearly states that this text is only relevant
when the video is not on autoplay. This is because in this situation
the video is represented by the placeholder image, which in this case
is the Clockwork Orange movie poster. When the video plays, the alt
text is not relevant because we have an audio track playing and audio
descriptions. So, when autoplay is turned off and a screenreader moves
onto this element, the screenreader needs to share with the blind user
exactly what the sighted users are seeing, which is the placeholder

Also, note how I am deliberately not talking about a poster frame,
because we are not providing accessibility information to the user
about the markup, but about the rendering. Since there is no
difference to the sighted user when looking at the video element
whether the frame is extracted from the video file or from a separate
resource, this is not something that a blind user needs to be told
about either.

I have indeed forgotten to add a third example (thanks for reminding
me) where I explain what is being marked up when there is no @poster
attribute in the video element. I knew I'd forget one of your use
cases. :-) I've now added a third example that shows what is marked up
with and without the @poster attribute, so you can see the difference.

> The ARIA specification defines aria-label this way:
>        aria-label
>        "Defines a string value that labels the __current element__.
> (Emphasis mine - JF) See related aria-labelledby.

Yes, and that's exactly what it is doing: a label that is being read
out for the video element.

> The purpose of aria-label is the same as that of aria-labelledby. It
> provides the user with a recognizable name of the object. The most common
> accessibility API mapping for a label is the accessible name property."
> http://www.w3.org/TR/wai-aria/states_and_properties#aria-label
> In your code example, the element is <video> and the @poster is an
> *attribute* of the <video> element (object). I have tried numerous times
> to explain this to the sub-team, with apparently no success: attributes
> cannot take on additional attributes, this is simply how the mechanics of
> HTML works.

It is not an attribute on an attribute. We are not describing the
image, but the placeholder frame for the video, which in this instance
*is* the video. See example 3 on
for the difference.

> When a screen reader (for example) announces aloud "A
> Clockwork Orange movie poster" it is labeling something completely
> different than the movie; it is inappropriate and confusing to suggest
> otherwise and contrary to what aria-label has been defined to express.

It's a label for the video element, which in the instance of
non-autoplay is simply the content of the placeholder frame. So, it's
completely correct.

>> 2. To satisfy use case 1: @aria-describedby
> <video poster="media/ClockworkOrangetrailer.jpg" controls
> aria-describedby="summary more desc"
>        aria-label="A Clockwork Orange movie poster">
>    <source src="media/ClockworkOrangetrailer.mp4">
>    <source src="media/ClockworkOrangetrailer.webm">
>    <source src="media/ClockworkOrangetrailer.ogv">
>  </video>
>  <div>
>    <p id="summary">
> In future Britain, charismatic delinquent Alex DeLarge is jailed and
> volunteers
> for an experimental aversion therapy developed by the government in an
> effort
> to solve society's crime problem... but not all goes to plan.
>    </p>
>    <ul>
>      <li>Director: Stanley Kubrick</li>
>      <li>Writers: Stanley Kubrick (screenplay), Anthony Burgess
> (novel)</li>
>      <li>Stars: Malcolm McDowell, Patrick Magee and Warren Clarke</li>
>      <li id="more"><a href="http://www.imdb.com/title/tt0066921/">Details
> on IMDB</a></li>
>    </ul>
>  </div>
> With regard to aria-describedby="summary more" I agree, this is good usage
> of ARIA and meets the needs of the use-case.  I had previously suggested
> that aria-describedby could meet this need:
>        "(NOTE: At this time, I believe that adding @alt to the video
> element is semantically weak and inappropriate: while I believe it is
> important if not critical to provide a textual summation of the actual
> video asset for accessibility considerations, attributes such as
> @aria-labelledby, @aria-describedby, or (@longdesc*) applied to <video>,
> or perhaps <summary> as a child element of <video>, would be more accurate
> and useful to non-visual users.)"
> http://www.w3.org/html/wg/wiki/ChangeProposals/PosterElement


> ******************
> As for aria-describedby="desc", Silvia you are being tricked by your eyes
> here (sorry).
> Perhaps a re-examination of the code, with the poster initially removed
> from the mix will help. (This will assume a closed system that only uses
> Safari as the browser available.):
> <video  <!-- establishes the element -->
>      src="media/ClockworkOrangetrailer.mp4"
>                <!-- declares an attribute of the element: @src -->
>        controls
>                <!-- declares an attribute of the element: @controls -->
>        aria-describedby="desc"
>                <!-- declares an attribute of the element:
> @aria-describedby -->
> The question now is, with the video object src (attribute) defined as an
> .mp4 file, what is its description? (In other words, what are you
> describing via aria-describedby?)
> Is it:
>  "(Summary:) In future Britain, charismatic delinquent Alex DeLarge is
> jailed and volunteers for an experimental aversion therapy developed by
> the government in an effort to solve society's crime problem... but not
> all goes to plan."?
> Or is it:
>  "...a movie poster with the film's protagonist, Alex (played by Malcolm
> McDowell) brandishing a knife while peering through a cutout of a stylized
> "A" or inverted "V". An eyeball appears floating at his wrist. The poster
> also reads "Being the adventures of a young man whose principle interests
> are rape, ultra-violence and Beethoven", as well as bold psychedelic type
> below the image which reads "Stanley Kubrick's Clockwork Orange..."?

You can put multiple sections into @aria-describedby and the
screenreader will read them all out. Thus, if we want to provide as a
longer description the poster's details and a summary of the video,
the way to do it is to reference both sections of text.

> ******************
> Revisiting the same code, but this time with the @poster declaration:
> <video  <!-- establishes the element -->
>        src="media/ClockworkOrangetrailer.mp4"
>                <!-- declares an attribute of the element: @src -->
>        controls
>                <!-- declares an attribute of the element: @controls -->
>        aria-describedby="desc"
>                <!-- declares an attribute of the element:
> @aria-describedby -->
> poster="http://www.iff2010.com/images/competitions/Film-can_details.png"
>                <!-- declares an attribute of the element: @poster -->
> Here again, what are you describing? The same as the previous example?

Yes, except that the placeholder frame may have a different content,
so that part has to e changed.

> But what of the imagery[*] at
> http://www.iff2010.com/images/competitions/Film-can_details.png? Why would
> the description text referenced by aria-describedby change significantly
> simply because the author chooses to also include an author-selected
> image?

Because the sighted user sees something different, too. See example 3
in the wiki page.

> [* For the benefit of some readers, I will describe the image: referenced
> is two film cans, one laying flat, the other standing on its edge, located
> just behind the laying can. Both cans are decorated with an image of a
> movie camera in the center, and ringed with a series of large black
> circles to simulate the look of a movie reel.]
> WCAG 2.0 states:
>        "Guideline 1.1 Text Alternatives: Provide text alternatives for
> any non-text content so that it can be changed into other forms people
> need, such as large print, braille, speech, symbols or simpler language."
> http://www.w3.org/TR/WCAG20/#text-equiv
> The .mp4 object is "non-text content".
> The .png object is "non-text content".
> They are *different* objects - equally related to the <video> elements as
> siblings,
> but discrete and unique non-the-less.

No, to the sighted user there is no difference if the placeholder
image comes from an image file or from the video itself. Therefore, it
is part of the video and needs to be described as part of it.

If at some other location on the Web page you want to create an img
element with the .png in it, then go ahead and make separate
descriptions for he image. But when used with the video, it is not a
separate entity.

> This is not a question of whether the image chosen is appropriate or not,
> or whether authors should or shouldn't do this, as no matter what we
> suggest in authoring guidance, the fact of the matter is that the code
> demonstrated would be fully conformant and would render on screen. For
> this reason, we are obligated to ensure that both non-text object have a
> means to be textually represented. The Guideline is clear: *any* non-text
> content requires text alternatives.

Yes, and that's exactly what my examples are providing.

>> 3. To satisfy use case 1: a new attribute @transcription
> This is interesting.
> I am curious to know why you wouldn't consider the <track + @kind> pattern
> here, as transcripts are essentially the same as captions minus the
> time-stamping information.

Because they are not time-stamped, they cannot be referenced through a
<track>. A <track> is time-aligned text with the video and therefore
transcripts have no place there.

> I am not overly concerned here, but more
> curious. Is there an advantage of treating the transcript as a different
> type of text file than other text files associated to the <video> element?

Well, that is a very good question. The alternative is to require the
user to provide a url on the page somewhere with a link to the
transcript, where that section could be hidden from view through the
off-screen effect. I've included this into the element because it
makes it possible to copy that link along with the video to another
web page. It signifies a tighter bond. But it is indeed a question
whether it should be there.

> As well, (at the risk of belaboring a point) an @transcription attribute
> to <video> re-enforces my assertion that attributes attached to elements
> define properties of the element, and not of other sibling attributes.

That is not the case for any of the attributes.

> ******************
>> Example 2: video with text
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_Alt_Technologies#Example_2:_vide
> o_with_text)
> You wrote:
> An @aria-label attribute is added with a short description which
> captures the core of the displayed video. The server makes sure to serve
> the
> text in the language that is in use on the Web page. If that language is
> switched, the aria-label text will also switch language.
> Screen readers and voice browsers would upon tabbing onto the video
> element read
> out the aria-label text.
>  <video poster="media/acessodigital.png" controls
>         aria-label="Web accessibility: cost or benefit">
>    <source src="media/acessodigital_en.mp4">
>    <source src="media/acessodigital_en.webm">
>    <source src="media/acessodigital_en.ogv">
>  </video>
> I have concerns here that you are expecting a server environment to be
> able
> to detect incoming language preferences (this of course can be done, but
> is
> really only present on large international sites), and that somehow this
> detection will then re-write the web-page to change the value of the
> aria-label.

I am not expecting any such thing. I am only saying that if the Web
page is already made available in multiple languages - which
incidentally nowadays typically means that a copy of the Website is
run with only slightly different page names - then aria-label content
has to be translated along with all the other page content. This is,
indeed, also the case for @alt on image attributes and is a reasonable
thing to expect.

> I know firsthand that here on campus I cannot reasonably expect my IT
> department
> to provide this kind of language negotiation on the server(s), especially
> given
> the  sheer number of decentralized servers on campus. We require (I
> believe) an
> author-based solution that addresses internationalization issues. Directly
> indicating 'in the code' changes of language benefits the majority of
> screen
> reader users, as most tools today can change language profiles on the fly.

Yes, the language in use in attributes is determined by the language
in use by parent elements. I'm not making up anything new here, just
saying that it hooks into existing mechanisms for

> WCAG 2.0 states:
>        "3.1.2 Language of Parts: The human language of each passage or
> phrase in
> the content can be programmatically determined except for proper names,
> technical
> terms, words of indeterminate language, and words or phrases that have
> become part
> of the vernacular of the immediately surrounding text. (Level AA)"
> http://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-other-lang-id.html
> ...while the associated Techniques for WCAG 2.0 states:
>        "H58: Using language attributes to identify changes in the human
> language
> The objective of this technique is to clearly identify any changes in
> language on
> a page by using the lang or xml:lang attribute, as appropriate for the
> HTML or
> XHTML version you use."
> http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58

Yes, that attribute indeed influences the language of the page. I'm
not making up anything new for media elements.

> In the example provided, the initial key frame offers text in three
> languages,
> despite the fact that the source language of the document is clearly (as
> well as
> programmatically) indicated as English:
>        <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
> dir="ltr">
> However, every sighted user accessing the page can clearly see that the
> embedded
> text is offered in 3 languages, and so we are obligated to convey the same
> information to non-sighted users as well. What *should* be conveyed to
> screen
> reader users is essentially this:
>        <span lang="pt-br">Acessibilidade Web: Custo ou Benefício?</span>
> Web
> Accessibility: Cost or Benefit <span lang="es">Accissibildad Web: Costo ou
> Beneficio?</span>
> ...and not 1/3 of that.

Actually, what I have provided gives those who have come to the Web
page expecting it to be in English more information, because it tells
them what languages are used in the placeholder frame. You wouldn't
write such a text visually, but you would say it. I think it's the
perfect representation of that text, short and sweet. Reading it out
three times in a row doesn't give users more information - it likely
only confuses them. And since the main language of the Web page as you
explained so nicely above is given, it is easy to represent that text
in that language and just name the other two languages.

> Returning to the definition of aria-label and aria-labeledby, the
> Candidate
> Recommendation states that this attribute "Defines a string value that
> labels
> the current element."
> While I am not 100% certain, I believe that aria attributes that take
> 'string
> values' are like @alt, in that they cannot take additional block level or
> inline
> elements (as this would I believe create a nesting error - something
> specifically
> forbidden today for @alt in HTML5), but I will request clarification from
> the ARIA
> experts here.

Yes, @aria-label and in fact any attribute value is like that and
cannot be further marked up.

> So while the second example certainly remains true to the Draft "HTML5:
> Techniques for providing useful text alternatives"
> (http://www.w3.org/TR/html-alt-techniques/#img-of-text) by focusing on the
> embedded text in that image, we need to also be sure that we can support
> internationalization at the author level by using the @lang or @lang-xml
> attributes. *IF* aria-label can support inline <span>s then this may be an
> acceptable possibility (although it still does not provide for:
>        " The placeholder images shows fingers on a keyboard titled 'Web
> accessibility: cost or benefit' in Spanish, English and Portuguese.")

I18n is supported in the usual way, which does not break through what
I am suggesting.

> Meanwhile the code example #3 shows:
> <div id="posteralt" style="position:absolute; left:-10000px; width:1px;
> height:1px; overflow:hidden;">The placeholder images shows fingers on a
> keyboard
> titled 'Web accessibility: cost or benefit' in Spanish, English and
> Portuguese.
> </div>
> This feels very "hacky" - and is extremely reminiscent of discussions
> surrounding
> @longdesc: "hidden content" that may or may not be appropriate for sighted
> users,
> 'discoverability', and other related head-bashing. As well, it returns to
> the
> question of what is being described, the (sic) .mp4 or the .png.

It is the recommended way for making content invisible at
http://webaim.org/techniques/css/invisiblecontent/ and if anyone
should know the best way it should be WebAIM IMO. I just followed
their recommendation and I think in this case it is appropriate.

> While David Singer indicated he was not keen on going there, both Silvia
> and now
> I am interested in teasing this out further, but note that this rests on
> events
> currently outside of our sub-group surrounding the re-opening of Issue 31
> @longdesc.

The attibutes here only have a loose connection to @longdesc. I think
we can definitely have this discussion and make proposals outside the
scope of @longdesc.

>> I would like to suggest a discussion of this proposal here on list and
>> in the next media subgroup meeting.
> +1

Excellent! I will go to bed and talk with you and everyone else about
it tomorrow then.

Received on Wednesday, 11 May 2011 13:50:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:20 UTC