- From: <Ehansen7@aol.com>
- Date: Tue, 28 Dec 1999 01:29:36 EST
- To: w3c-wai-au@w3.org
- CC: w3c-wai-gl@w3.org, w3c-wai-ua@w3.org
Date: 28 December 1999, 00:48 hrs To: CG, AU, UA Lists Subject: Comments on Multimedia and Audio This memo includes Eric Hansen's responses to comments by Wendy Chisolm and Madeleine Rothberg regarding a thread entitled "Revised Checkpoints: WCAG(1.4/1.3) and UAAG(2.5)" (see http://lists.w3.org/Archives/Public/w3c-wai-gl/1999OctDec/0165.html). Related threads include "[CG] Captions for audio clips" and "[UA] Issue 138". SECTION 1 - Eric Hansen's comments on Wendy Chisolm's comments on Eric Hansen's comments. Key: EH1: Original memo by Eric Hansen WC: Wendy Chisolm (16 Dec 1999) EH2: Eric Hansen (28 Dec 1999) EH1:: >7. WAI should develop one or more specification documents (W3C Notes or >Recommendations) for: > >a. auditory descriptions, including (1) synthesized-speech auditory >descriptions and (2) prerecorded auditory descriptions (including >"prerecorded auditory description tracks" and "prerecorded auditory >description supplement tracks", the latter being explained later in this >document) >b. captions >c. synchronization of collated text transcripts >d. synchronization of audio clips with their text transcripts > >I see document "c" as possibly encompassing "a" and "b". Even better >perhaps, they could all four items could be addressed together. (I am not >sure whether all these are within the charter of SMIL.) > >I would not expect the task of retrofitting existing "prerecorded auditory >descriptions tracks" to the specifications to be difficult. Content and >data in existing captions could, I expect, be almost entirely reused in >new captions conforming to the captions specification. WC:: This sounds like something that would be in the Techniques document, particularly documented in a SMIL-specific section/chapter. EH1:: >==== >2. Avoid the use of "synchronized alternative equivalents" in WCAG. > >The term seems redundant. WC:: ok EH1:: ok >3. Avoid the use of "synchronized equivalents" in both WCAG and UAAG. > >This is important because often the components to that are presented >together are not equivalent to each other. The term seems misleading. >==== > >4. Use the term "synchronized alternatives". > >Implies the idea that it is alternative content, which is essentially >true. This is my preferred term, I think. >==== WC:: In some cases it is an alternative, but in others it is an equivalent (for example captions of speech, alt-text of bitmap image). Also, I thought that we had decided to replace "alternative" with "equivalent" if the "alternative" was providing the _functional_ equivalent. I suggest we stick with the term "synchronized equivalent." EH2:: I have changed my mind and agree with you about using the term "synchronized equivalent". See definition in "Terminology, Etc." memo. ==== EH1: >5. Use "visual track" and "auditory track" > >Use "visual track" and "auditory track" rather than video track and audio >track when referring to multimedia presentations. WC:: ok EH2:: Thanks… ==== EH1:: > >6. Avoid the term "continuous alternatives". > >Not sure that this is a great term. It is probably best just to name the >specific things. WC:: I did not see this in WCAG (guidelines nor Techiques), this must be a UAGL issue? EH2:: It was an UA issue and I think that it was taken care of. We ought not introduce terms that are not necessary. ==== EH1:: >7. Add synchronization to the glossary. > >"Synchronization, Synchronize, Synchronization Data, Synchronized >Alternatives" > >"Synchronization refers to sensible time-coordination of two or more >presentation components, particularly where at least one of the components >is a multimedia presentation (e.g., movie or animation) or _audio clip_ or >a portion of the presentation or audio clip." > >"For Web content developers, the requirement to synchronize means to >provide the data that will permit sensible time-coordinated presentation >by a user agent. For example, Web content developer can ensure that the >segments of caption text are neither too long nor too short and that they >mapped to segments of the visual track that are appropriate in length. WC:: I can see adding this part to WCAG glossary. EH1:: Based on recent discussion, I suggest the following words: "Synchronization, Synchronized Equivalents" "_Synchronization_ refers to sensible time-coordination of two or more presentation components, particularly where at least one of the components is a multimedia presentation (e.g., movie or animation) or an _audio presentation_." "For Web content developers, the requirement to synchronize means to provide the data that will permit sensible time-coordinated presentation of content by a user agent. For example, Web content developer can ensure that the segments of caption text are neither too long nor too short and that they mapped to segments of the visual track that are appropriate in length." A _synchronized equivalent_ is an equivalent that is synchronized with some other component, particularly, the visual track or auditory track of a multimedia presentation or an audio-only presentation. The most prominent synchronized equivalents are auditory description and captions. An 'auditory description' is considered a synchronized equivalent because it is synchronized with the auditory and visual tracks of a multimedia presentation. 'Captions' are also considered synchronized equivalents because the are synchronized with the auditory track or audio presentation. A collated text transcript to which synchronization information has been added may be similarly presented in synchronization with the auditory and visual tracks of a multimedia presentation. ====== >"The idea of "sensible time-coordination" of components centers of the >idea of simultaneity of presentation, but also encompasses strategies for >handling deviations from simultaneity resulting from a variety of causes. > >Consider how certain deviations in simultaneity might be handled in >auditory descriptions. Auditory descriptions are considered synchronized, >since each segment of description audio is presented at the same time as a >segment of the auditory track, e.g., a natural pause in the spoken >dialogue. Yet a deviation can arise when a segment of the auditory >description is lengthy enough that it cannot be entirely spoken within the >natural pause. In this case there must be a strategy for dealing with the >mismatch between the description and the pause in the auditory track. The >two major types of auditory descriptions lend themselves to different >strategies. Prerecorded auditory descriptions usually deal with such >mismatches by spreading the lengthy auditory description over more than >one natural pause. When expertly done, this strategy does not ordinarily >weaken the effectiveness of the overall presentation. On the other hand, a >synthesized-speech auditory description lends itself to ot! >! >! >her strate > gies. Since synthesize {12/17/99 Note by Eric Hansen: I think that this scrambled material was simply not finished in the original. It is a loose end…} > >Let us briefly consider how deviations might be handled for captions. > >Captions consist of a text equivalent of the auditory track that is >synchronized with the visual track. Captions are essential for individuals >who require an alternative way of accessing the meaning of audio, such as >individuals who are deaf. Typically, a segment of the caption text appears >visually near the video for several second while the person reads the >text. As the visual track continues, a new segment of the caption text is >presented. > >One problem arises if the caption text is longer than can fit in the >display space. This can be particularly difficult if due to a visual >disability, the font size has been enlarged, thus reducing the amount of >caption text that can be presented. The user agent must respond sensibly >to such problems, such as by ensuring that the user has the opportunity to >navigate (e.g., scroll down or page down) through the caption segment >before proceeding with the visual presentation and presenting the next >segment. Some means must be provided to allow the user to signal that the >presentation may resume. > >===== WC:: some of this seems appropriate for the Techniques document, other pieces are obviously intended for the User Agent Guidelines glossary. They could be reworked for discussion in WCAG Techniques, or could be linked to from WCAG Techniques. EH2:: You are welcome to use as you see fit. >PART 3 -- CHANGES TO WCAG DOCUMENT > >1. Add checkpoints 1.3 into checkpoint 1.4 and then break 1.4 into several >checkpoints. WC:: I am deleting much of your text and commenting on certain pieces of it. In general, I feel that much of what is being incorporated into checkpoint text is more appropriate in the Techniques document. I propose 1 new checkpoint, and reworking 1.3 and 1.4 to cover the 6 that Eric proposed. 1.3 is discussed here, 1.4 and 1.x are discussed later. <checkpoint-proposal> 1.3 Provide a synchronized auditory description for each multimedia presentation (e.g., movie or animation). [Priority 1 for important information, Priority 2 otherwise.] </checkpoint-proposal> The techniques for satisfying this checkpoint will be discussed in the Techniques document: 1. synchronizing a pre-recorded human auditory track. 2. synchronizing a recorded speech synthesized auditory track. 3. synchronizing a text file on the fly. I believe your proposed checkpoints 1.4.A and 1.4.B are techniques for checkpoint 1.3. EH2:: As a reminder, here is the current checkpoint 1.3: WCAG 1.0 (5 May 1999) checkpoint 1.3: "1.3 Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation. [Priority 1] Synchronize the auditory description with the audio track as per checkpoint 1.4. Refer to checkpoint 1.1 for information about textual equivalents for visual information." And here is my 4 December refinement of it: New WCAG checkpoint 1.4.A (4 December 1999): "1.4.A Until user agents can produce synthesized-speech auditory descriptions, provide an auditory description of _important information_ for each multimedia presentation (e.g., movie or animation). [Priority 1]" And here is your suggestion: <Wendy's-checkpoint-proposal> 1.3 Provide a synchronized auditory description for each multimedia presentation (e.g., movie or animation). [Priority 1 for important information, Priority 2 otherwise.] </Wendy's-checkpoint-proposal> EH2: A few comments about your proposal. 1. The split priority (Priority 2 for "otherwise") gives the checkpoint a higher overall priority than it currently enjoys. This may be warranted but should be taken to the working group. 2. The absence of the "until user agents" clause makes it a permanent checkpoint, otherwise it would expire. This is warranted. 3. The term "synchronized auditory description" is redundant because synchronization is already part of the definition of auditory description. This needs to be fixed. 4. I have some concern about relegating my checkpoints 1.4.A and 1.4.B to the techniques. I might feel different if I knew how the SMIL capabilities related to these. It seems to me that WAI could do more to define specifications for these different kinds of auditory descriptions. I would like to hear additional opinions on this. 5. In conclusion, I still like my proposal for checkpoint 1.4.A. ==== For background for the reader of this memo, here is my 4 Dec version of 1.4.B: WCAG checkpoint 1.4.B (4 December 1999) (id: WC-SSAD): "1.4.B For each multimedia presentation, provide data that will produce a synthesized-speech auditory description. [Priority 1]" "Note: This checkpoint becomes effective one year after the release of a W3C specification for synthesized-speech auditory descriptions." By the way, as far as I know Madeleine's suggestion regarding "synthesized auditory equivalent" is probably better than my term "synthesized-speech auditory equivalent". It is briefer. Here is the new 28 Dec 1999 version of checkpoint 1.4.B: "1.4.B For each dynamic audio/visual presentation {or movie or animation}, provide data that will produce a synthesized auditory description. [Priority 1]" "Note: This checkpoint becomes effective one year after the release of a W3C specification for synthesized-speech auditory descriptions." By the way, as far as I know Madeleine's suggestion regarding "synthesized auditory equivalent" is probably better than my term "synthesized-speech auditory equivalent". It is briefer. I use the term dynamic audio/visual presentation instead of "multimedia presentation" since there has been some discussion of changing the term multimedia to include audio-only presentations. I am not sure what I think about that proposal. I think that we ought to be cautious. I realize that this information may be somewhat dated, but a 1990 book by Bergman and Moore (Managing Interactive Video/Multimedia Projects) says: "Even the words 'interactive video' and 'multimedia' can cause confusion. For several years, the videodisc was the only source of motion video segments that could be accessed rapidly to support effective interactivity. Hence the term applied to these applications came to be 'interactive videodisc,' or more commonly, 'IVD.' Recently, digital technology has made it possible to provide motion video using other devices, especially the small optical discs called CD-ROM. Another factor has been the development of image-based applications that use graphic pictures and digital audio, and no motion video at all. The term 'multimedia' has been adapted as a generic reference to all such image-based applications." Thus, to term audio-only presentations as form of "multimedia" doesn't seem to fit this 1990 definition. I'd like to hear other opinions. >==== >New WCAG checkpoint 1.4.C (4 December 1999): >"1.4.C For each multimedia presentation (e.g., movie or animation), >provide captions and a collated text transcript. [Priority 1]" > >Rationale: These two pieces are essential (captions for individuals who >are deaf; collated text transcript for individuals who are deaf-blind). We >know that captions are needed and we have technologies that can handle it. >A collated text transcript is relatively straightforward to supply. WC:: this is a rewording of 1.4. To make it jive with my proposed rewording of 1.3 I propose: <checkpoint-proposal> 1.4 Provide captions and a collated text transcript for each multimedia presentation (e.g., movie or animation). [Priority 1] </checkpoint-proposal> EH2:: This looks fine to me, unless you lump audio-only presentations in multimedia, in which it might become: 1.4 Provide captions and a collated text transcript for each dynamic audio/visual presentation (e.g., movie or animation). [Priority 1] >==== >New WCAG checkpoint 1.4.D (4 December 1999) (id: WC-ACLIP-TT): >"1.4.D For each audio clip, provide a text transcript. [Priority 1]" > >Rationale: A text transcript is _essential_ for disability access to audio >clips, whereas a text transcript is not essential for access to auditory >tracks of multimedia presentations (for example, the collated text >transcript and caption text includes the information found in the text >transcript of the auditory track). >==== WC:: this is covered in the current checkpoint 1.1 <current-checkpoint> 1.1 Provide a text equivalent for every non-text element (e.g., via "alt", "longdesc", or in element content). This includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated GIFs), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), stand-alone audio files, audio tracks of video, and video. [Priority 1] </current-checkpoint> EH2:: OK >New WCAG checkpoint 1.4.E (4 December 1999) (id: WC-ACLIP-SYNC-TT): >"1.4.E Synchronize each audio clip with its text transcript. [Priority >1]" {I prefer the brevity of this version.} >{or} >"1.4.E For each audio clip, provide data that will allow user agents to >synchronize the audio clip with the text transcript. [Priority 1]" >"Note: This checkpoint becomes effective one year after the release of a >W3C recommendation addressing the synchronization of audio clips with >their text transcripts." WC:: I agree with discussion on the list that "audio" should be included in "multimedia." However, there was consensus that this ought to be a Priority 2. Therefore, I propose: <checkpoint-proposal> 1.x Provide captions for each stand-alone audio clip or stream, as appropriate. [Priority 2] Note. For short audio clips, providing a text equivalent as discussed in checkpoint 1.1 is all that is needed. This checkpoint is intended to cover audio clips of speech such as news broadcasts or a lyrical performance. </checkpoint-proposal> the "as appropriate" is supposed to signify that it is not necessary to caption all audio clips. for example, we discussed back in May that we do not need to caption an instrumental performance, however it is appropriate to caption a musical performance with singing. EH2:: I am willing to consider a change that would make "audio-only presentations" part of "multimedia presentations". The earlier decision (spring 1999) was to keep them separate. Regardless of whether the definitions are combined, there will be different checkpoints for the two, since the priorities are different. I would like to get other views as to whether audio-only presentations are really considered "multimedia." I suppose then, that multimedia presentations would include movies, animations, audio-only presentations, but not short sounds. See earlier discussion in this memo regarding this issue. I would suggest the following wording: Eric's 28 December suggestion: "1.x Provide captions for each word-using audio presentation [Priority 2]" Rationale: I think that the term "audio presentation" seems better than "audio clip". I think that it is easier for us to make our own definition of audio presentation than of audio clip. I think that the definition can make clear the intended scope of the checkpoint. I have added the new term "word-using" to exclude instrumental performances from the requirement. This material relates to a thread on "captions for audio". Do you have a URL for notes on the decision not to caption musical performance? Did the decision also address text equivalents of musical performance? Here is a possible definition of audio presentation. "Audio Presentation" "Examples of audio presentations include a musical performance, a radio-style news broadcast, or a book reading. The term "audio presentation" is contrasted with "short sounds". A "word-using audio presentation" is one that uses words, such as a musical performance with lyrics, in contrast to musical performance that only uses musical instruments." Assuming that musical scores are not required for instrumental music, WCAG checkpoint 1.1 should contain a note such as the following. "Note. The requirement of a text equivalent for a musical performance does not include a requirement for musical scores." ===== EH1:: >New WCAG checkpoint 1.4.F >"For each multimedia presentation for which a synthesized-speech auditory >description of _important_ information is likely to be inaccessible, >provide a prerecorded auditory description _important_ information." >"[Priority 3]" >{or} >"For each multimedia presentation, provide a prerecorded auditory >description." >"[Priority 3]" >{or} >"For each multimedia presentation, provide a prerecorded auditory >description for _important_ information." >"[Priority 3]" WC:: If synthesizing auditory descriptions is a technique for 1.3, then this proposed checkpoint is not needed. EH2:: I have heard the opinion expressed that prerecorded auditory descriptions are preferred in some settings and am of the opinion that this should stay. I would like to hear additional opinions on this. I think that in some settings prerecorded auditory description are felt to be helpful. I think that this deserves discussion on the list. ========================== SECTION 2 - Eric Hansen's comments on Madeleine Rothberg's 21 Dec 1999 comments on Eric Hansen's comments. (re: Issue #138) EH1: Eric's earlier commments EH2: Eric's Hansen's 28 December 1999 comments MR: Madeleine Rothberg's 21 December 1999 comments MR:: Here are my comments on issue 138 http://cmos-eng.rehab.uiuc.edu/ua-issues/issues-table.html#138 I do not have any strong opinions on the use of the terms "synchronized alternative equivalents", "synchronized equivalents", "synchronized alternatives", "continuous equivalents." Some alternatives are synchronized and some are not, but if we make clear which are which perhaps we can use the same terms for both. I don't see the difference between "alternative" and "equivalent," so I am happy to let the editorial types make the decision on this part of the issue. I do have comments on other parts of Eric's proposal. I agree with Wendy's comments to the GL list archived at: http://lists.w3.org/Archives/Public/w3c-wai-gl/1999OctDec/0218.html Many of Eric's proposals for the WCAG involved splitting a single checkpoint into several checkpoints. Wendy commented that she felt the material could be incorporated into techniques instead. I think we can take a similar approach for the UAAG, and that much of Eric's analysis would make excellent techniques information. Specifically: EH:: I think that there are a huge number of ways in which text, video, audio and their equivalents _could be combined_ to make multimedia presentations and audio clips accessible to people with disabilities, but only a much smaller number of ways are really essential or really valuable and it is up to WAI to more specifically identify and describe that smaller number of combinations. MR:: I agree that certain combinations of multimedia tracks are more likely than others to be useful, but I think that existing UA checkpoints say that all tracks must be able to be turned on and off. This gives the user complete control over which tracks are rendered, making it unnecessary for the UA to understand the combinations. This would include 2.1 "Ensure that the user has access to all content, including alternative equivalents for content. [Priority 1] " and also 2.5 "If more than one alternative equivalent is available for content, allow the user to choose from among the alternatives. This includes the choice of viewing no alternatives. [Priority 1]" as well as checkpoints in Guideline 3 that specify that users be able to turn on and off any track since it might cause distraction or otherwise interfere with use. I think Eric's excellent description of the uses of different combinations of tracks would be helpful techniques material so that UA developers see the reason to implement the checkpoints listed here. Eric's analysis includes distinguishing between text transcripts, text of audio description, and collated text transcripts (which are a combination of the other two). The use of a collated text transcript is a neat idea, but it is not yet a part of any specification, so I don't think we can shape our guidelines around it. Similarly, both the WCAG and the UAAG would like to support the idea of synthesized speech for rendering of audio descriptions from a text file, but we do not have a technology that can do that. Another possible synchronized equivalent that does not have an implementation yet is a sign language track. Though I've argued in the past that sign language is an important equivalent (and I still feel that it is) I acknowledge that unless SMIL or some other specification has a way for authors to indicate that a given track is intended as an equivalent track, we can't require UAs to allow that track to be turned on and off in the same way that we can require for captions and audio descriptions (defined as of the latest public draft of SMIL-Boston). Overall, what I'm trying to say is, we need to craft some forward looking language, probably in the techniques, to promote new ideas. This would include synthesized audio descriptions, combining captions and audio descriptions into a collated text transcript (which can then replace both tracks) and a way to indicate that a video track is intended as an alternative equivalent, for sign language use. But until then, I think we are best off with the current umbrella checkpoints referring to various kinds of media, with techniques showing the currently recognized ways of implementing them as well as future ideas for improved features. EH2:: I think that the Techniques document is a good place to discuss the value of providing a movie or animation (such as of a sign language translation) that can be synchronized with any other media type (text, audio presentation, visual track, auditory track, or movie or animation.) MR:: I think this approach matches the spirit of our changes in the December 7 telecon, where we resolved to merge the checkpoints in GL 4 for audio, video, and animation into a single set of checkpoints. Whenever possible, I think we are better off with fewer checkpoints as long as they are clear. The use of examples and Notes helps with that clarity. I don't think we need a series of checkpoints on each different aspect of alternative tracks. <END OF MEMO>
Received on Tuesday, 28 December 1999 01:30:13 UTC