Re: video and long text descriptions / transcripts

On Fri, Apr 6, 2012 at 3:05 AM, John Foliot <john@foliot.ca> wrote:
> Silvia Pfeiffer wrote:
>
>> On Thu, Apr 5, 2012 at 5:16 AM, David Singer <singer@apple.com> wrote:
>> >
>> > On Mar 30, 2012, at 14:52 , Silvia Pfeiffer wrote:
>> >>
>> >> We keep talking about "long text descriptions for videos" and
>> >> "transcripts" as separate things. There is an implied assumption
>> that
>> >> we need two different solutions for these, which I would like to
>> >> challenge.
>
> Sorry I have not been able to participate more fully up until now, but with
> a household move this past weekend, I am now only digging out.
>
>
> Silvia, I would like to ask you what you believe the "longer textual
> description" does for non-sighted users, and why authors should be providing
> this information. You seem to be very strongly coming from a perspective of
> "literalism", where you believe that the transcript is somehow the
> equivalent of a long description. It isn't.
>
> When I speak of a longer textual description, I differentiate it from an
> Accessible Name (AccName) in the Accessibility APIs, which is the short
> textual description (This is a movie, it's name is "A Clockwork Orange"). We
> don't have a native HTML5 means of applying an AccName to the video element
> today, although as previously noted we can use either aria-label or
> aria-labelledby.
>
> When we look at a longer textual description, what I am looking for is
> something that would map to the Accessible Description (or, to be even more
> precise, the equivalent of the MSAA AccessibleDescription Property).  That
> MSAA property is defined as:
>
>        "An object's AccessibleDescription property provides a textual
> description about an object's visual appearance. The description is
> primarily used to provide greater context for low-vision or blind users, but
> can also be used for context searching or other applications.
>
>        The AccessibleDescription property is needed if the description is
> not obvious, or if it is redundant based on the object's AccessibleName,
> AccessibleRole, State, and Value properties. For example, a button with "OK"
> would not need additional information, but a button that shows a picture of
> a cactus would. The AccessibleName, and AccessibleRole (and perhaps Help)
> properties for the cactus button would describe its purpose, but the
> AccessibleDescription property would convey information that is less
> tangible, such as "A button that shows a picture of a cactus.""
> [source:
> http://msdn.microsoft.com/en-us/library/system.windows.forms.control.accessi
> bledescription.aspx]
>
>
> Clearly, and for truth, that is NOT a transcript,

Why would a transcript not satisfy this need?


> which you have defined
> (correctly IMHO) as:
>
>> * a full transcription of everything happening in the video, including
>> a transcript of all dialogs and the important visual bits
>
> If we continue to work from the presumption that a Transcript is the
> "caption file" minus the time-stamping aspect (are we in agreement here?),

No we are not. You are missing the description of the "important
visual bits". Basically for me it's more like:

transcript = caption file (without timing) + video description file
(without timing)


> then this also aligns closely to what a "movie caption" is, as defined by
> the DCMP Captioning Key here:
>
>        "Captioning is the process of converting the audio content of a
> television broadcast, webcast, film, video, CD-ROM, DVD, live event, or
> other productions into text and displaying the text on a screen or monitor.
> Captions not only display words as the textual equivalent of spoken dialogue
> or narration, but they also include speaker identification, sound effects,
> and music description."
> [source: http://www.dcmp.org/captioningkey]


That's not sufficient for a deaf-blind user to gain a full
understanding of the video.

> In the case of a video that runs to 60, 90, 120 minutes, that transcript
> file could run to hundreds of [printed] pages and is most clearly *NOT* "...
> a textual description about an object's visual appearance"

Agreed, captions are not sufficient. However, I don't have a problem
in it being many pages long. That's exactly the point: if I want to
watch a video and I am deaf-blind, I still want to understand
everything that is happening in that video, including every single
line of text. Anything you remove from that gives me a lesser
experience than what the sighted viewer gets.


>> And which one is
>> the best for a deaf-blind user to have?
>
> While I appreciate your consideration for this particular user-group, I
> think you are casting your net at too narrow a group of users: any
> non-sighted user would appreciate having a longer textual description of a
> lengthy video without having to wade through a book's worth of text file
> prior to watching (listening to) a video (complete with described
> audio/text).

That's ok - those users can have links underneath the video to lesser
files, to summaries etc etc. They are not appropriate, though, as a
full text representation of the video.



>> Certainly the answer is that a
>> full transcription of everything being said and all the scene
>> descriptions is the best that a deaf-blind user can have and also the
>> most complete text representation of the video. I therefore call this
>> "the optimal long description document".
>
> And I call it the "Transcript", which does not meet the definition of the
> Accessible.Description property as defined by the Accessibility APIs.

Why not?


>> > b) authors are unlikely to provide both, however
>>
>> Yes, that is one of the things on my mind, too. This is why I don't
>> think it makes much sense to have both a @transcript and a @longdesc
>> attribute on the video: if we have an actual transcript, it would be
>> the same document behind both attributes and if we don't have on, we'd
>> have a url behind the longdesc and none behind the transcript. In both
>> these situations, the @transcript attribute is not useful.
>
> With due respect, you are looking at this from the perspective of either the
> implementer or the author, and not the end user. I cannot think of any
> end-user, who, when wanting to know which version of a video they are about
> to consume, will first "read the book" - this is simply out of alignment
> with reality.

There's a short description for this use case. You don't need the long
description for this.


> We have (it seems to me) 2 problems here:
>
> 1) 'defining' what a longer textual description actually is, who it is for
> and the role it serves (a.k.a. the difference between what I am talking
> about and "the transcript"), and

Agreed.

> 2) the programmatic means that we link these various textual documents to
> the <video> element.  I proposed @transcript, but if a better solution comes
> along, I am all ears and open to investigating it (and I note that I've seen
> Ted's draft counter-proposal to Issue 194, but have not had time to digest
> it yet).

Why are we singling out the transcript from all the other potential
textual representations that a video can have? Why do we need it as a
special case with an automatic link? I don't buy into that need.


>> A long description for the purposes
>> of deaf-blind users has to be discoverable when focused upon the video
>> element.
>
> If the longer textual description were *only* for deaf-blind users, perhaps.
> But that is not the role of the longer textual description, nor the only
> target user-group.

The long textual description is for accessibility needs - can we at
least agree on that?
If not, then I really don't see a need to have any more than a set of
<a> elements in a <div> underneath the video with @aria-describedby
pointing to the <div> and the <a>s marked through microdata with their
type of content.


>> Other related content such as interactive transcripts,
>> scripts, and other video metadata only has to live nearby the video
>> and be discoverable when moving around the page. I don't see a need
>> for a programmatic association of those with the video other than what
>> @describedBy already offers.
>
> Note that you can only apply aria-describedby once to an element, so if you
> are hoping to use it for both 'interactive transcripts' *AND* other video
> metadata (and I've already expressed my concern over the use of that
> specific term), then you will be out of luck - it's an either/or choice you
> have. All the more reason to fully define and understand what all of the
> different types of textual content we might have will be, and the role that
> each of those different types (and files) serve to all users.


You can add a list of IDREFs into aria-describedby, so I don't see a
problem with having multiple areas / links etc describe the video.

Cheers,
Silvia.

Received on Friday, 6 April 2012 05:45:07 UTC