- From: John Foliot <jfoliot@stanford.edu>
- Date: Sat, 14 May 2011 10:43:04 -0700 (PDT)
- To: "'David Singer'" <singer@apple.com>, "'HTML Accessibility Task Force'" <public-html-a11y@w3.org>
- Cc: "'Jared Smith'" <jared@webaim.org>, "'E.J. Zufelt'" <everett@zufelt.ca>
David Singer wrote: > > I'm going to try to clear up some of my own confusion here. > > I think we might need three pieces of information linked to a media > (video or audio) element: > > * a short text (kinda like alt) > * a long description > * a transcript > > in all cases, they should provide equivalents for what someone who can > consume the media 'normally' would pick up. (I think this is as true > of audio as of video, by the way). Hi David, I agree, although the transcript is actually an asset that both "sighted" and "non-sighted" users will often have a desire for. I am not too concerned about Silvia's proposal to introduce a new @transcript / @transcription attribute (outside of the fact that I am fussy about elements versus attributes, but that's for another argument, er, discussion). > > So, I was sort of right and sort-of wrong when I said that the short- > text should not describe the poster, but the media. I'm right, the > element is more than the first frame or poster frame. I'm wrong, in > that the (in this case sighted) normal user would have gathered > something from that initial frame. > > so, not good: > > <video poster="TheLeopard.jpg" short-text="A movie poster for The > Leopard" src="..." /> > > because the sighted user will know it's a video element and that it's > offering them the trailer. > > Way better is to relay some of the information from the poster: > > <video poster="TheLeopard.jpg" short-text="Trailer for The Leopard, > starring Burt Lancaster" src="..." /> *IF* the author does indeed choose to use a movie poster as a first-frame image choice. But despite its poor choice of name, the image referenced by @poster today could be *any* image, including a pure-play branding image ("iTunes Theater Presents: The Leopard", where the imagery would be partially stock or specially commissioned imagery including the iTunes "logo", the sell line as imbedded display font, promotional movie stills, etc.) - in this case not only do we need a short textual description about the <video> - "Trailer for The Leopard, starring Burt Lancaster", we also need to provide the non-sighted user with the actual text burned into the image proper, and ideally a description of what that imagery is. In your example here, while the short-text value of "Trailer for The Leopard, starring Burt Lancaster" is indeed a short description of the video asset (the principle attribute of the <video> element, referenced by the src attribute), it conveys none of the information in the author-selected first-frame: "An image of two film cans with Apples embossed upon them propped beside a film projector, and the text "iTunes Theater Presents: The Leopard"" (see how you can actually visualize that?...) Earlier this week Leonie Watson summed it up quite clearly: "When I arrive at a video (with my screen reader), I want to know what that static image/frame contains. At that moment in time, in the world according to me and my screen reader, that image exists entirely in its own right. It might be a still from the video, it might be a separate image. It might be related content, it might be a completely unrelated corporate ident (for example). Wanting to know what that image contains doesn't prevent me from wanting to know what the video contains. There may well be overlap, but equally they could be worlds apart." > > the long description can provide a more narrative version of the > trailer, and the transcript a full transcript. At this time, the one thing that we all seem to be relatively in agreement on is that this particular requirement would most likely be handled by aria-describedby. <video src="..." aria-describedby="synopsis"></video> <p id="synopsis"> The Prince of Salina, a noble aristocrat of impeccable integrity, tries to preserve his family and class amid the tumultuous social upheavals of 1860's Sicily.</p> The assumption is that most videos will have *some* associated text describing something about the movie for sighted users on the same page, so that should be linked by aria-describedby (it is dangerous to make assumptions, true, however...). In the case where there is *no* on-screen description of the movie, then there would be no 'non-visual' description either - what's good for the goose is good for the gander, as my grandmother used to say. > This way the short text > is enabling the non-sighted user just like the sighted one: > sighted: see poster, decide it's interesting, watch trailer > non-sighted: get the short-text, decide it's interesting, read the long > description and/or transcript > > (I'm using non-sighted as a shorthand for someone who, for whatever > reason, can't see the video - their eyes are busy elsewhere, their UA > is unable to play it, and so on. Hope that's OK). No Problem by me, we need to speak plainly sometimes, and I don't think you are being misunderstood here. *************** While on the topic of Plain Speaking: As a sighted user, I am always very careful when making assumptions and assertions on behalf of daily screen reader users. True enough that after over a decade of being an accessibility specialist I should have a pretty clear comprehension of the big picture, but still, I routinely discuss scenarios with a number of trusted blind users to be sure I have not strayed off track. Two such colleagues that I have previously discussed this subject with are Victor Tsaran (a daily screen reader user), who runs Yahoo!'s accessibility lab in Santa Clara and works with engineers and developers of all stripes as they produce web content for millions of daily consumers around the globe, and with Everett Zufelt, who has a CS Engineering background, is a daily screen reader user as well, and (as an engineer) has committed over 1,000 accessibility patches to the open source Drupal CMS system (http://groups.drupal.org/node/117539) - which is to say, he is smart, talented and gets it. Neither of them builds browsers or screen readers, but both have a first-hand perspective on delivering content to end users, both as content creators, as engineers supporting that content delivery, and as content consumers. After talking with Everett Friday, he wrote me an email, which he has subsequently shared to this list yesterday (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0356.html). I think that the critical thing that Everett has identified, that some others continue to argue against, is that there are in fact 3 things here that need to be conveyed to the non-sighted user, irrespective of their source or how we code things up: * The video player (initiated by the <video> element), which is the bounding box (or canvas as Everett described it) and the controls associated to the player (whether they are the browser's native controls, or JavaScripted controls supplied by the author) - we need to ensure that all roles, states and properties are accessibly delivered * The 'video' itself - the media asset (which itself is a further composite of imagery, movement, sound, and text) - we need to ensure that each part of that composite asset has accessible alternatives * The 'still' image (regardless of its source or specific content). Since both the video and the still image are non-textual objects, and WCAG 2 1.1 clearly states that *any* non-textual object requires textual equivalents, it is abundantly clear that we need mechanisms to provide that text for both the video and the still image. Silvia Pfeiffer wrote: > > The point that nobody seems to understand is that there is no need to > provide a text alternative for the video. All we need is a text > alternative for the poster (read: placeholder image). The video's > content is not presented at the time where a text alternative for the > video *element* is needed. Stating that Victor and Everett, both daily screen readers and working engineers, "don't understand what they need" simply doesn't cut it - they clearly *do* understand: they understand engineering, they understand the web, they understand their AT tools and they understand their user experience. The video is the video, the still imagery is the still imagery, and both require a short textual alternative, as well as a longer textual description if and when appropriate. When Jared Smith, the Associate Director of WebAIM ("...and if anyone should know the best way it should be WebAIM." - Silvia Pfeiffer) writes to this list (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0322.html) and also confirms that the users and content authors that he interacts with daily have these requirements, or Leonie Watson, the Director of Accessibility at Nomensa, a leading UK-based web agency with clients such as P&G, Virgin, Nottingham University, the UK Treasury & UK Ministry of Justice (and more), writes to also confirm these needs in her first person voice (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0301.html), we must stop and ask, who is not really understanding? You do not solve use-cases and user requirements by insisting they don't exist. Arguments that we do not need both of these types of textual descriptions cannot be accepted - such arguments are (for me at least) a deal breaker - this is a hill I am prepared to die on. There are enough voices of blind users and accessibility specialists on this list alone who have made this statement of need that we simply cannot ignore their request, regardless of how clear or confused those initial statements of request were perhaps conveyed. If some contributors to this list cannot understand why these requirements exists, I am sorry that we have not been able to better explain why - it's not been for lack of trying. But it reaches a point where, if you still do not understand the "why", you need to trust those who are directly affected when they say they need something, and figure out a way to deliver it, even if you still don't fully understand why. It is my belief that until such time as we are in agreement on what *all* of our actual needs are, we will continue to be talking about incomplete, confusing or conflicting potential solutions. Before proposing aria-label or @title be shoe-horned in there somewhere for "alt technologies for video", let's be very clear what we are providing alternative texts for, and then we can look to effectively deliver those solutions. JF
Received on Saturday, 14 May 2011 17:43:34 UTC