- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sun, 15 May 2011 12:45:14 +1000
- To: John Foliot <jfoliot@stanford.edu>
- Cc: David Singer <singer@apple.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>, Jared Smith <jared@webaim.org>, "E.J. Zufelt" <everett@zufelt.ca>
On Sun, May 15, 2011 at 3:43 AM, John Foliot <jfoliot@stanford.edu> wrote: > David Singer wrote: >> >> I'm going to try to clear up some of my own confusion here. >> >> I think we might need three pieces of information linked to a media >> (video or audio) element: >> >> * a short text (kinda like alt) >> * a long description >> * a transcript >> >> in all cases, they should provide equivalents for what someone who can >> consume the media 'normally' would pick up. (I think this is as true >> of audio as of video, by the way). > > Hi David, > > I agree, although the transcript is actually an asset that both "sighted" > and "non-sighted" users will often have a desire for. I am not too > concerned about Silvia's proposal to introduce a new @transcript / > @transcription attribute (outside of the fact that I am fussy about > elements versus attributes, but that's for another argument, er, > discussion). > > >> >> So, I was sort of right and sort-of wrong when I said that the short- >> text should not describe the poster, but the media. I'm right, the >> element is more than the first frame or poster frame. I'm wrong, in >> that the (in this case sighted) normal user would have gathered >> something from that initial frame. >> >> so, not good: >> >> <video poster="TheLeopard.jpg" short-text="A movie poster for The >> Leopard" src="..." /> >> >> because the sighted user will know it's a video element and that it's >> offering them the trailer. >> >> Way better is to relay some of the information from the poster: >> >> <video poster="TheLeopard.jpg" short-text="Trailer for The Leopard, >> starring Burt Lancaster" src="..." /> > > *IF* the author does indeed choose to use a movie poster as a first-frame > image choice. But despite its poor choice of name, the image referenced by > @poster today could be *any* image, including a pure-play branding image > ("iTunes Theater Presents: The Leopard", where the imagery would be > partially stock or specially commissioned imagery including the iTunes > "logo", the sell line as imbedded display font, promotional movie stills, > etc.) - in this case not only do we need a short textual description about > the <video> - "Trailer for The Leopard, starring Burt Lancaster", we also > need to provide the non-sighted user with the actual text burned into the > image proper, and ideally a description of what that imagery is. > > In your example here, while the short-text value of "Trailer for The > Leopard, starring Burt Lancaster" is indeed a short description of the > video asset (the principle attribute of the <video> element, referenced by > the src attribute), it conveys none of the information in the > author-selected first-frame: > > "An image of two film cans with Apples embossed upon them propped > beside a film projector, and the text "iTunes Theater Presents: The > Leopard"" > > (see how you can actually visualize that?...) > > Earlier this week Leonie Watson summed it up quite clearly: > > "When I arrive at a video (with my screen reader), I want to know > what that static image/frame contains. At that moment in time, in the > world according to me and my screen reader, that image exists entirely in > its own right. It might be a still from the video, it might be a separate > image. It might be related content, it might be a completely unrelated > corporate ident (for example). > > Wanting to know what that image contains doesn't prevent me from > wanting to know what the video contains. There may well be overlap, but > equally they could be worlds apart." > > >> >> the long description can provide a more narrative version of the >> trailer, and the transcript a full transcript. > > At this time, the one thing that we all seem to be relatively in agreement > on is that this particular requirement would most likely be handled by > aria-describedby. > > <video src="..." aria-describedby="synopsis"></video> > <p id="synopsis"> The Prince of Salina, a noble aristocrat of > impeccable integrity, tries to preserve his family and class amid the > tumultuous social upheavals of 1860's Sicily.</p> > > > The assumption is that most videos will have *some* associated text > describing something about the movie for sighted users on the same page, > so that should be linked by aria-describedby (it is dangerous to make > assumptions, true, however...). In the case where there is *no* on-screen > description of the movie, then there would be no 'non-visual' description > either - what's good for the goose is good for the gander, as my > grandmother used to say. > > > >> This way the short text >> is enabling the non-sighted user just like the sighted one: >> sighted: see poster, decide it's interesting, watch trailer >> non-sighted: get the short-text, decide it's interesting, read the long >> description and/or transcript >> >> (I'm using non-sighted as a shorthand for someone who, for whatever >> reason, can't see the video - their eyes are busy elsewhere, their UA >> is unable to play it, and so on. Hope that's OK). > > No Problem by me, we need to speak plainly sometimes, and I don't think > you are being misunderstood here. > > *************** > > While on the topic of Plain Speaking: > > As a sighted user, I am always very careful when making assumptions and > assertions on behalf of daily screen reader users. True enough that after > over a decade of being an accessibility specialist I should have a pretty > clear comprehension of the big picture, but still, I routinely discuss > scenarios with a number of trusted blind users to be sure I have not > strayed off track. > > Two such colleagues that I have previously discussed this subject with are > Victor Tsaran (a daily screen reader user), who runs Yahoo!'s > accessibility lab in Santa Clara and works with engineers and developers > of all stripes as they produce web content for millions of daily consumers > around the globe, and with Everett Zufelt, who has a CS Engineering > background, is a daily screen reader user as well, and (as an engineer) > has committed over 1,000 accessibility patches to the open source Drupal > CMS system (http://groups.drupal.org/node/117539) - which is to say, he is > smart, talented and gets it. Neither of them builds browsers or screen > readers, but both have a first-hand perspective on delivering content to > end users, both as content creators, as engineers supporting that content > delivery, and as content consumers. > > After talking with Everett Friday, he wrote me an email, which he has > subsequently shared to this list yesterday > (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0356.html). > > I think that the critical thing that Everett has identified, that some > others continue to argue against, is that there are in fact 3 things here > that need to be conveyed to the non-sighted user, irrespective of their > source or how we code things up: > > * The video player (initiated by the <video> element), which is the > bounding box (or canvas as Everett described it) and the controls > associated to the player (whether they are the browser's native controls, > or JavaScripted controls supplied by the author) - we need to ensure that > all roles, states and properties are accessibly delivered > > * The 'video' itself - the media asset (which itself is a further > composite of imagery, movement, sound, and text) - we need to ensure that > each part of that composite asset has accessible alternatives > > * The 'still' image (regardless of its source or specific content). > > Since both the video and the still image are non-textual objects, and WCAG > 2 1.1 clearly states that *any* non-textual object requires textual > equivalents, it is abundantly clear that we need mechanisms to provide > that text for both the video and the still image. > > > Silvia Pfeiffer wrote: >> >> The point that nobody seems to understand is that there is no need to >> provide a text alternative for the video. All we need is a text >> alternative for the poster (read: placeholder image). The video's >> content is not presented at the time where a text alternative for the >> video *element* is needed. > > Stating that Victor and Everett, both daily screen readers and working > engineers, "don't understand what they need" simply doesn't cut it - they > clearly *do* understand: they understand engineering, they understand the > web, they understand their AT tools and they understand their user > experience. The video is the video, the still imagery is the still > imagery, and both require a short textual alternative, as well as a longer > textual description if and when appropriate. > > When Jared Smith, the Associate Director of WebAIM ("...and if anyone > should know the best way it should be WebAIM." - Silvia Pfeiffer) writes > to this list > (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0322.html) > and also confirms that the users and content authors that he interacts > with daily have these requirements, or Leonie Watson, the Director of > Accessibility at Nomensa, a leading UK-based web agency with clients such > as P&G, Virgin, Nottingham University, the UK Treasury & UK Ministry of > Justice (and more), writes to also confirm these needs in her first person > voice > (http://lists.w3.org/Archives/Public/public-html-a11y/2011May/0301.html), > we must stop and ask, who is not really understanding? > > You do not solve use-cases and user requirements by insisting they don't > exist. Arguments that we do not need both of these types of textual > descriptions cannot be accepted - such arguments are (for me at least) a > deal breaker - this is a hill I am prepared to die on. There are enough > voices of blind users and accessibility specialists on this list alone who > have made this statement of need that we simply cannot ignore their > request, regardless of how clear or confused those initial statements of > request were perhaps conveyed. If some contributors to this list cannot > understand why these requirements exists, I am sorry that we have not been > able to better explain why - it's not been for lack of trying. But it > reaches a point where, if you still do not understand the "why", you need > to trust those who are directly affected when they say they need > something, and figure out a way to deliver it, even if you still don't > fully understand why. I am sorry, but you have taken this out of context. I was concretely talking about the point in time where the video has been loaded, is paused and only a representative image is visible on screen. It is ONLY this situation that I was referring to, when I said that we do not need to represent the content of the video at that time. And I was referring to a situation where there is no other information about the video available on the Web page. I still believe it would be wrong to represent the content of the video in this situation (and in your arguments above you seem to agree with that). I have advocated though that we need to represent the content of the representative image. In actual fact, I believe we are both arguing the same thing, except with different solutions and with mixing use cases with each other. > It is my belief that until such time as we are in agreement on what *all* > of our actual needs are, we will continue to be talking about incomplete, > confusing or conflicting potential solutions. Before proposing aria-label > or @title be shoe-horned in there somewhere for "alt technologies for > video", let's be very clear what we are providing alternative texts for, > and then we can look to effectively deliver those solutions. I agree that we haven't clarified the different use cases / situations yet. We need to clearly list them and then define which attributes / elements provide the solution for each of these use cases. I was trying to make a start on this in the wiki page, but I have obivously also mixed used cases, so let's identify the different dimensions that we have and then map the solutions. I suggest as a start on listing the dimensions of interest that we should discuss we have the following: * graphical browser / text-only browser: i.e. is the video element visible? This can be identified through the display of all HTML content inside the <video> element. * video player design (as per Everett's suggestion): i.e. is it the native browser player or something custom This can be identified through the absense of the @controls attribute. * presence of a representative frame? This can be identified through the absence of the @autoplay attribute. * presence of a short video description on the page? This would be represented through the presence of a video text title on the page. * presence of a video summary on the page? This would be represented through a description/summary of the video content on the page. * presence of a full-text transcript of the video content on the page or on a different page? This would be represented through a (possibly interactive and possibly timed) transcript on the page or on a different page. * presence of a short poster summary on the page? This would be represented through the presence of a poster title on the page. * presence of a long poster description on the page? This would be represented through a description/summary of the poster content on the page. Have I missed any dimension? Cheers, Silvia.
Received on Sunday, 15 May 2011 02:46:02 UTC