- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 6 Apr 2012 15:44:18 +1000
- To: John Foliot <john@foliot.ca>
- Cc: David Singer <singer@apple.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
On Fri, Apr 6, 2012 at 3:05 AM, John Foliot <john@foliot.ca> wrote: > Silvia Pfeiffer wrote: > >> On Thu, Apr 5, 2012 at 5:16 AM, David Singer <singer@apple.com> wrote: >> > >> > On Mar 30, 2012, at 14:52 , Silvia Pfeiffer wrote: >> >> >> >> We keep talking about "long text descriptions for videos" and >> >> "transcripts" as separate things. There is an implied assumption >> that >> >> we need two different solutions for these, which I would like to >> >> challenge. > > Sorry I have not been able to participate more fully up until now, but with > a household move this past weekend, I am now only digging out. > > > Silvia, I would like to ask you what you believe the "longer textual > description" does for non-sighted users, and why authors should be providing > this information. You seem to be very strongly coming from a perspective of > "literalism", where you believe that the transcript is somehow the > equivalent of a long description. It isn't. > > When I speak of a longer textual description, I differentiate it from an > Accessible Name (AccName) in the Accessibility APIs, which is the short > textual description (This is a movie, it's name is "A Clockwork Orange"). We > don't have a native HTML5 means of applying an AccName to the video element > today, although as previously noted we can use either aria-label or > aria-labelledby. > > When we look at a longer textual description, what I am looking for is > something that would map to the Accessible Description (or, to be even more > precise, the equivalent of the MSAA AccessibleDescription Property). That > MSAA property is defined as: > > "An object's AccessibleDescription property provides a textual > description about an object's visual appearance. The description is > primarily used to provide greater context for low-vision or blind users, but > can also be used for context searching or other applications. > > The AccessibleDescription property is needed if the description is > not obvious, or if it is redundant based on the object's AccessibleName, > AccessibleRole, State, and Value properties. For example, a button with "OK" > would not need additional information, but a button that shows a picture of > a cactus would. The AccessibleName, and AccessibleRole (and perhaps Help) > properties for the cactus button would describe its purpose, but the > AccessibleDescription property would convey information that is less > tangible, such as "A button that shows a picture of a cactus."" > [source: > http://msdn.microsoft.com/en-us/library/system.windows.forms.control.accessi > bledescription.aspx] > > > Clearly, and for truth, that is NOT a transcript, Why would a transcript not satisfy this need? > which you have defined > (correctly IMHO) as: > >> * a full transcription of everything happening in the video, including >> a transcript of all dialogs and the important visual bits > > If we continue to work from the presumption that a Transcript is the > "caption file" minus the time-stamping aspect (are we in agreement here?), No we are not. You are missing the description of the "important visual bits". Basically for me it's more like: transcript = caption file (without timing) + video description file (without timing) > then this also aligns closely to what a "movie caption" is, as defined by > the DCMP Captioning Key here: > > "Captioning is the process of converting the audio content of a > television broadcast, webcast, film, video, CD-ROM, DVD, live event, or > other productions into text and displaying the text on a screen or monitor. > Captions not only display words as the textual equivalent of spoken dialogue > or narration, but they also include speaker identification, sound effects, > and music description." > [source: http://www.dcmp.org/captioningkey] That's not sufficient for a deaf-blind user to gain a full understanding of the video. > In the case of a video that runs to 60, 90, 120 minutes, that transcript > file could run to hundreds of [printed] pages and is most clearly *NOT* "... > a textual description about an object's visual appearance" Agreed, captions are not sufficient. However, I don't have a problem in it being many pages long. That's exactly the point: if I want to watch a video and I am deaf-blind, I still want to understand everything that is happening in that video, including every single line of text. Anything you remove from that gives me a lesser experience than what the sighted viewer gets. >> And which one is >> the best for a deaf-blind user to have? > > While I appreciate your consideration for this particular user-group, I > think you are casting your net at too narrow a group of users: any > non-sighted user would appreciate having a longer textual description of a > lengthy video without having to wade through a book's worth of text file > prior to watching (listening to) a video (complete with described > audio/text). That's ok - those users can have links underneath the video to lesser files, to summaries etc etc. They are not appropriate, though, as a full text representation of the video. >> Certainly the answer is that a >> full transcription of everything being said and all the scene >> descriptions is the best that a deaf-blind user can have and also the >> most complete text representation of the video. I therefore call this >> "the optimal long description document". > > And I call it the "Transcript", which does not meet the definition of the > Accessible.Description property as defined by the Accessibility APIs. Why not? >> > b) authors are unlikely to provide both, however >> >> Yes, that is one of the things on my mind, too. This is why I don't >> think it makes much sense to have both a @transcript and a @longdesc >> attribute on the video: if we have an actual transcript, it would be >> the same document behind both attributes and if we don't have on, we'd >> have a url behind the longdesc and none behind the transcript. In both >> these situations, the @transcript attribute is not useful. > > With due respect, you are looking at this from the perspective of either the > implementer or the author, and not the end user. I cannot think of any > end-user, who, when wanting to know which version of a video they are about > to consume, will first "read the book" - this is simply out of alignment > with reality. There's a short description for this use case. You don't need the long description for this. > We have (it seems to me) 2 problems here: > > 1) 'defining' what a longer textual description actually is, who it is for > and the role it serves (a.k.a. the difference between what I am talking > about and "the transcript"), and Agreed. > 2) the programmatic means that we link these various textual documents to > the <video> element. I proposed @transcript, but if a better solution comes > along, I am all ears and open to investigating it (and I note that I've seen > Ted's draft counter-proposal to Issue 194, but have not had time to digest > it yet). Why are we singling out the transcript from all the other potential textual representations that a video can have? Why do we need it as a special case with an automatic link? I don't buy into that need. >> A long description for the purposes >> of deaf-blind users has to be discoverable when focused upon the video >> element. > > If the longer textual description were *only* for deaf-blind users, perhaps. > But that is not the role of the longer textual description, nor the only > target user-group. The long textual description is for accessibility needs - can we at least agree on that? If not, then I really don't see a need to have any more than a set of <a> elements in a <div> underneath the video with @aria-describedby pointing to the <div> and the <a>s marked through microdata with their type of content. >> Other related content such as interactive transcripts, >> scripts, and other video metadata only has to live nearby the video >> and be discoverable when moving around the page. I don't see a need >> for a programmatic association of those with the video other than what >> @describedBy already offers. > > Note that you can only apply aria-describedby once to an element, so if you > are hoping to use it for both 'interactive transcripts' *AND* other video > metadata (and I've already expressed my concern over the use of that > specific term), then you will be out of luck - it's an either/or choice you > have. All the more reason to fully define and understand what all of the > different types of textual content we might have will be, and the role that > each of those different types (and files) serve to all users. You can add a list of IDREFs into aria-describedby, so I don't see a problem with having multiple areas / links etc describe the video. Cheers, Silvia.
Received on Friday, 6 April 2012 05:45:07 UTC