- From: Geoff Freed <Geoff_Freed@wgbh.org>
- Date: 1 Oct 1998 10:58:31 -0400
- To: allan_jm@tsb1.tsbvi.edu, cindy.king@gallaudet.edu, dick.bulterman@cwi.nl, hoschka@w3.org, jbrewer@w3.org, "Jon Gunderson" <jongund@staff.uiuc.edu>, "Judy Brewer" <jbrewer@w3.org>, kerscher@montana.com, Lloyd.Rutledge@cwi.nl, robla@real.com, w3c-wai-ua@w3.org
Reply to: RE>>SMIL: draft items for UA guidelines Judy Brewer said: >#2. Would be useful to have captioning conventions for how non-speech >sounds and additional audio information such as change-of-language are >represented; what exists? -- Agreed. Geoff, can you help us out there at >all; what exists to date? GF says... There was a study at Gallaudet's Technology Assessment Program (TAP) published in 1996 which addresed just this issue. Info can be found at http://www.gallaudet.edu/~sjking/pubcat/gripub2.html#pub31 I can also provide the group with information based on my work with The Caption Center. Judy also said... >#3. SMIL needs to be extended to be able to handle audio descriptions of >video images (see sample of suggested code below) -- Agreed, and the method >you propose looks good to me, although I would want to be sure that it is a >solution that is easy to implement. Could other people please review and >comment on this - Philipp, George, Geoff? ... GF says... Based on my conversations with Lloyd, and on the samples he's sent me, I think this method is good. I have yet to try it myself to test it, but I hope to do it soon. If anyone else has implemented an example, please let me know. Geoff Freed/NCAM -------------------------------------- Date: 9/30/98 3:22 PM To: Geoff Freed From: Judy Brewer Lloyd, Thanks very much for your comments. I'm bouncing this message to the User Guidelines Working Group, as well as other people cc'd on the original message, for additional comment, preferably by October 6. Here are my brief comments; please excuse (or correct) my very rough paraphrasing of your points. I've included your complete message below. The original message is at http://lists.w3.org/Archives/Public/w3c-wai-ua/1998JulSep/0200.html if any one is looking for it. #1. Reference to SMIL should be to SMIL 1.0 -- Agreed. #2. Would be useful to have captioning conventions for how non-speech sounds and additional audio information such as change-of-language are represented; what exists? -- Agreed. Geoff, can you help us out there at all; what exists to date? #3. SMIL needs to be extended to be able to handle audio descriptions of video images (see sample of suggested code below) -- Agreed, and the method you propose looks good to me, although I would want to be sure that it is a solution that is easy to implement. Could other people please review and comment on this - Philipp, George, Geoff? ...I also think that we need a mechanism for text descriptions of video images, which my original draft didn't deal with adequately. #8. Repositioning of captions could be handled by switch statements... -- Unclear here. Would captions that are overlaid on video media object be controllable in this manner, or is the idea that user preferences should require captions to be displayed elsewhere than on top of a video media object? #9. Providing user control over presentation pace is complex and the following issues need to be considered... -- Agreed. Need to consider to what extent hard synchronization is required among other media objects that have timing relationships with the audio (or perhaps video?) that is being pace-controlled. Comments or suggestions from anyone else on this? Regards, Judy ------ [Lloyd's message follows] To: jbrewer@w3.org cc: hoschka@w3.org, robla@real.com, Dick.Bulterman@cwi.nl, Lynda.Hardman@cwi.nl, Lloyd.Rutledge@cwi.nl Subject: Re: SMIL: draft items for UA guidelines Date: Mon, 21 Sep 1998 18:15:12 +0200 From: Lloyd Rutledge <Lloyd.Rutledge@cwi.nl> Hi Judy, I'm just now back from my travels, which included our meeting at the WOW breakfast meeting. I was glad to see your UA guidelines email and to see that the consideration UA with SMIL is moving forward. Below are some comments I have on the draft items. > GUIDELINES/RATIONALE/TECHNIQUES: > 1. User should be able to identify and switch text captions of audio > objects on & off. [Priority one] > - Rationale: Some users require captions anytime they are available, in > which case users should be able to set a preference in the user preferences > to view captions; other users require captions only in certain > circumstances, in which case they need a mechanism to identify when > captions are available, and to turn them on while a presentation is > playing. Mechanism for identification of captions should also function > non-visually as some users can neither hear audio files nor see captions > but can still access captions through screen-reading software and > refreshable Braille display. > - Technique: Provide user interface to switch display of media objects with > "system-caption" test attribute on and off. This must be possible both in > the static user preferences, and dynamically while the presentation is > playing. This guideline applies to implementors of SMIL 1.0, which contrasts with guidelines that apply to the working group for the next version of SMIL. It might help to distinguish between these guidelines so that the different audiences reading the draft can readily identify what applies most to them. > 2. User should be able to control size, color, and background color of > captions. [Priority one] > - Rationale: Some users require specific font size, color, and contrast > with caption background to be able to view captions. > - Technique: Provide user interface to change size, color, and background > color of media objects with "system-caption" test attribute. This must be > possible both in the static user preferences, and dynamically while the > presentation is playing. This is an issue of style and specifying style for particular media objects. This issue is primarily in the domain of CSS. SMIL includes document portions of other media formats and integrates them into a unified presentation. If CSS or an equivalent does or is made to apply any media type then we should ensure that the SMIL format and the players for it are kept compatible. Is there a captions format with markup/structure that identifies what portions of the captioning are things such as - different speakers - music - various noises (telephone, doorbell, car horn, siren, whistle, etc.) - tones of voice - brief occurrences of different languages not necessarily intended to be understood by the audience Having a caption format that identifies such things abstractly rather than just giving their text description is important because the techniques for identifying them to the user can vary between presentations. Style specs can then say how these abstractions should be presented to the user given the user's preferences. > 3. User should be able to identify and turn on and off audio descriptions > of video objects. [Priority one] > - Rationale: Users who cannot see a video media object need a non-visual > way to identify that an audio description is available. > - Technique: Provide a standard mechanism for notifying third party > assistive technologies (e.g. screen-reading software) of the existence of > an audio description for a video object. Provide a mechanism (which can > function non-visually) for turning on or off the audio description. This applies both to implementors and the SMIL working group. It applies for the SMIL working group because the format currently does not explicitly address audio descriptions. We could introduce a "system-audio-desc" attribute that works just like the system-captions attribute but for audio description rather than captions. This new attribute can be included in the next version of SMIL. In the meantime, it can be included uniformly by implementors and specified as a private extension to SMIL in the manner described in the SMIL appendix. The current SMIL 1.0 time model handles the time issues related to audio description well. After seeing Geoff's demo of audio description, which we both way at the breakfast meeting, I came up with the example SMIL code below <seq> <par> <video id="video" fill="freeze" src="video.mpeg" clip-begin="npt=0s" clip-end="npt=5s"/> <audio src="audio.wav" clip-begin="npt=0s" clip-end="npt=5s"/> <audio begin="id(video)(6s) system-audio-desc="on" src="desc.wav"/> </par> <par> <video src="video.mpeg" clip-begin="npt=6s" clip-end="npt=10s"> <audio src="audio.wav" clip-begin="npt=6s" clip-end="npt=10s"> </par> </seq> This presentation has a video file and an accompanying primary audio file that is the main soundtrack for the video. There is also one two second long audio description file. This describes a visual event that takes place five seconds into the video. There is no natural silent period in the soundtrack into which an audio description could fit. The behavior of this presentation when audio description is on is that the video and sound track play for five seconds, then the sound track stops and the video freezes for five seconds while the audio description plays and then the remaining five seconds of the video and audio play. This scenario enables a sighted viewer and a non-sighted listener to watch the same presentation simultaneously, with the pausing video keeping the viewer engaged. If audio description is turned off, the same presentation should play the full 10 seconds of video and soundtrack audio with no perceivable pauses or gaps. This presentation is represented in the SMIL code above as a sequence of five second sub-presentations. In each, corresponding five second segments of the video and soundtrack audio are played in parallel. The first five second segment also includes a switch statement chooses between playing an empty ref element, effectively doing nothing, and playing the audio description file describing the event that happens five seconds into the video. This switch statement is timed to start 5 seconds into the playing of the video. If the audio description is not played, the switch statement does not do anything, thus ending immediately. This causes the end time of the par element to be the same as the duration of the video clip. In this case, the showing of the first video clip should play seamlessly into the second. However, if audio description is on, the five second description file will play, bringing the duration of the first par element to 10 seconds. Because the video on this par is only 5 seconds long, when the video clip ends the fill attribute value "freeze" takes effect, causes the final frame of the video to stay on the screen until the audio description ends. This same technique can exploit the occurances of silent periods in the soundtrack. If there is a soundtrack silent period that coincides with a visual event, then the audio description should play during that silence. For the duration of this silent period, it is not necessary to freeze the video while the audio description is being played because the audio description would not drown out any important soundtrack sound. However, if the audio description is longer than the silent period, you may still want to freeze the video at the end of the silent period. The technique above can address this by sychronizing not with the very end of the video but with the time in the audio when the silent period begins, and clipping the audio so that the end of the audio of the end of the silent period. > 8. User should be able to reposition captions. [Priority two] > - Rationale: Some multi-media presentations will include positioning > conflicts between captions which can obscure key visual elements of video > media objects. > - Technique: Provide mechanisms to control caption display location > dynamically and through user preferences. This could be handled by switch statements that choose between alternate screen layout specifications whose selection is based on SMIL test attributes that refer to user characteristics, such as system-captions and the system-audio-desc attribute proposed in this message. > 9. User should be able to dynamically control rate for audio media objects. > [Priority two] > - Rationale: Users with aural-processing learning disabilities may require > a slower pace of audio; users who are experienced users of synthesized > speech may be able to tolerate a far faster pace of audio. > - Tecnique: Provide mechanism for dynamic control of presentation pace. If browsers have general presentation rate controls for the user, then the whole presentation could be slowed in a synchronous fashion to the point at which the user can comprehend the audio. However, this may not always be desired. For example, if the presentation includes video or non-speech audio that is not played simultaneously with speech, then the speech processing impaired would have no need for those parts of the presentation to be slowed. Further, such rate control for an entire presentation is very hard to implement. This is another issue that style sheets apply to. Auditory style sheets could be written for a user that say how much to slow down speech audio and at what rate to synthesize speech. Some degree of rate-control can be specified with a switch element choosing between alternative sound files whose choice is based on test attributes. Test attributes that apply here are not explicitly defined in SMIL, but, as described earlier, there are extension techniques described in the document that can be used to add test attributes. Perhaps an appropriate collection of test attributes for this issue can be determined and included in a future version of SMIL. Discussion of audio rate control should also address how to play other media objects that have timing relationships with the audio. If the audio is slowed, what other media object should be slowed, and how? One technique for addressing this with SMIL is to use code similar to the example given for issue 3, above. The audio and video media objects accompanying the affected audio object could be divided into clips, along with the affected audio object. These clips would be established at key event points whose synchronicity should be maintained. This would cause video clips to sometimes be frozen and sound clips to pause at these key points in order to maintain overall temporal consistency. Note that relational timing in SMIL is in terms of the playing time of the references media object, not in terms of ideal playing time or the playing time of the overall presentation. For example, if a video is triggered to start five seconds into an audio clip, and the audio clip is slowed to half speed, the vidoe will actually play after 10 seconds real time. Is more continuous synchronicity needed to properly address issue 9? For example, if an audio file with speech is slowed down, should an accompanying video be slowed down at exactly the same pace so that lip synching is maintained? If this is true, then hard synchronization would need to be implemented in all SMIL players. Hard synchronization is discussed in the specification as an allowable alternative for handling synchronization, but it is not required for use in all cases. A future version of SMIL could provide the author with the ability to specify what parts of a presentation should have hard synchronization and what should not. If hard synchronization were important for using a dynamic control rate for audio media objects, then this need would motivate hard synchronizations inclusion in the next version of SMIL. Regards, Lloyd -- Lloyd Rutledge vox: +31 20 592 41 27 CWI (Centrum voor Wiskunde en Informatica) fax: +31 20 592 41 99 Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands net: lloyd@cwi.nl P.O. Box 94079, NL-1090 GB Amsterdam, The Netherlands http://www.cwi.nl/~lloyd ---------- Judy Brewer jbrewer@w3.org +1.617.258.9741 http://www.w3.org/WAI Director, Web Accessibility Initiative (WAI) International Program Office World Wide Web Consortium (W3C) MIT/LCS Room NE3-355, 545 Technology Square, Cambridge, MA, 02139, USA ------------------ RFC822 Header Follows ------------------ Received: by wgbh.org with ADMIN;30 Sep 1998 15:19:46 -0400 Received: from fishbone (fishbone.w3.org [18.29.0.167]) by www10.w3.org (8.8.5/8.7.3) with SMTP id PAA05290; Wed, 30 Sep 1998 15:14:47 -0400 (EDT) X-Authentication-Warning: www10.w3.org: Host fishbone.w3.org [18.29.0.167] claimed to be fishbone Message-Id: <3.0.5.32.19980930151435.00985cf0@localhost> X-Sender: jbrewer@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32) Date: Wed, 30 Sep 1998 15:14:35 -0400 To: w3c-wai-ua@w3.org, Jon Gunderson <jongund@staff.uiuc.edu>, allan_jm@tsb1.tsbvi.edu, hoschka@w3.org, kerscher@montana.com, cindy.king@gallaudet.edu, robla@real.com, dick.bulterman@cwi.nl, Lloyd.Rutledge@cwi.nl, jbrewer@w3.org, Geoff_Freed@wgbh.org From: Judy Brewer <jbrewer@w3.org> Subject: Re: SMIL: draft items for UA guidelines Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"
Received on Thursday, 1 October 1998 11:09:35 UTC