Re: @longdesc scope (was: HTML Media Transcript, Issue-194: Are we done?)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 10 Jul 2012 11:31:21 +0200
To: Laura Carlson <laura.lee.carlson@gmail.com>
Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <20120710113121173690.938ca0f1@xn--mlform-iua.no>
Laura Carlson, Mon, 9 Jul 2012 16:16:26 -0500:

(I reiterate that I stopped pushing for img@transcript.[0])

> Some references that might help:

I reviewed them to see. And for the most part, they all, including the 
WebAim and the WCAG references, show that transcripts share a lot with 
CSSquirrel's long text alternative for comic #42. [1] So, very helpful 
to underline my point, I should say.

	I'll go through them, one by one.

Reference 1: WebAIm's text says: [2]

   "Transcripts do not have to be verbatim accounts of the spoken 
    word in a video. They can contain additional descriptions,
    explanations, or comments that may be beneficial. Transcripts
    allow deaf/blind users to [ … snip … ]"

	CSSquirrel's long text alternative for comic #42 fits that bill in 
every way, except in the unimportant ways: (A) How the transcript was 
made (not from audio) and (B) that CSSquirrel's long text alternative 
obviously has no special relevance for the deaf - it is primarily meant 
for the blind.

Reference 2: [3] This WCAG techniques section about 'Collated text 
transcripts' focus on sounds: "spoken dialogue as well as any other 
significant sounds". However, the example in the preceding section on 
'Visual information and motion' makes it clear that a collated text 
transcripts may have content that does *not* stem from the sound of the 
video itself: [4] 

   'Here's an example of a collated text transcript of a clip from
    "The Lion King" (available at [DVS]). Note that the Describer
    is providing the auditory description of the video track and
   that the description has been integrated into the transcript.'

	Technically, this remains a transcript, since the descriptive text 
stems from the auditory description track of the video. But what if 
there were no auditory description track? Would it then be "forbidden" 
to include such a description as part of the transcript? (That's a 
rhetorical question.) Of course, it is better to transcribe the 
description track, if there is one, than to 'invent' the description. 
The document says nothing about not to 'invent' when there is no 
description track, however.

Reference 3: [5] This is a reference to how transcriptions are made. I 
do, however, not see, why I could not send CSSquirrel's comic #42 to a 
such transcription service.

Reference 4: [6] This text is occupied with transcribing sounds. But 
there is a glitch at the end. It says:

   "Seeing Jamie Lee Curtis turn around in shock obviously shows 
    she is afraid of something, but it helps to know if she heard
    a scream, a glass break somewhere in the house, or a creepy

	One wonders, however, how the blind is supposed to be "seeing Jamie 
Lee Curtis turn around in shock"? Obviously, this ain't possible unless 
the transcript contains textual equivalents of visual events - see 
reference 2 above.

Reference 5: [7] This text describes transcripts as 'a textual version 
of the video'. It goes without saying that a transcript that only 
focuses on representation of the audio of the video, would not be able 
to function as a textual version of the video.

Reference 5: [8] The first best practice it mention is to edit the 
video's audio in such a way that it includes all the visual information 
- of course, this is only possible for simple videos:

    'If you do show-of-hands (e.g., "How many people follow WCAG
     2.0?"), say the results for the audio recording (e.g.,
     "about half").'

	I do not believe, however, that the intent of this advice is that one 
leaves the "about half" text out from the transcript just because the 
audio did not include it.

Reference 6: [9] Yeah, this seems to offer the most orthodox 
description of what a transcript is: "A movie transcript contains only 
one element from a movie -- the dialogue." But as we saw, this orthodox 
understanding is not in line what WCAG or WebAim says. But even so, 
this references still admits that some transcripts contains more pure 
audio transcripts.

	Now to your transcript examples.

Example 1: [10] I doubt that the text '[Camera flies through clouds,…]' 
was included because the camera barked or something. Lots of non-sound 
info within brackets here.
Example 2: [11] Has text equivalents of visual events, within brackets.
Example 3: [12] Visual 'setting the scene' info in brackets here too.
Example 4: [13] I was prevented from evaluating the(se) transcript(s).
Example 5: [14] Much non-audio info within brackets here too.
Example 6: [15] A main section of this example contains the textual 
representation (31 company names) of a graphical montage. Another, 
important point in this video, is the point when the lecturer "talks" 
in sign language. This, silent moment, is described within brackets. 
The reason why the sign language section is not transcribed, is because 
it is a rhetorical point of the lecturer that the non-sign-langauge 
audience do not understand it. Otherwise a 

To sum it up: If these recipes and examples were provided to make the 
point that CSSquirrel's long text alternative for comic #42 is 
fundamentally different from a transcript, then I fail to see how they 
serve the argument. The difference - from a *typical* transcript, is, 
firstly, that the long text alternative for comic #42 does not have the 
deaf audience as its primary audience. And, secondly, that a typical 
transcript have audio as its primary source.

Leif Halvard Silli
Received on Tuesday, 10 July 2012 11:04:38 GMT

