Re: SMIL: draft items for UA guidelines


Look for GWK:: before my few comments.


At 03:14 PM 9/30/98 -0400, Judy Brewer wrote:
>Thanks very much for your comments. I'm bouncing this message to the User
>Guidelines Working Group, as well as other people cc'd on the original
>message, for additional comment, preferably by October 6.
>Here are my brief comments; please excuse (or correct) my very rough
>paraphrasing of your points. I've included your complete message below. 
>The original message is at
> if any
>one is looking for it.
>#1. Reference to SMIL should be to SMIL 1.0 -- Agreed.
>#2. Would be useful to have captioning conventions for how non-speech
>sounds and additional audio information such as change-of-language are
>represented; what exists? -- Agreed. Geoff, can you help us out there at
>all; what exists to date?
>#3. SMIL needs to be extended to be able to handle audio descriptions of
>video images (see sample of suggested code below) -- Agreed, and the method
>you propose looks good to me, although I would want to be sure that it is a
>solution that is easy to implement. Could other people please review and
>comment on this - Philipp, George, Geoff? ...I also think that we need a
>mechanism for text descriptions of video images, which my original draft
>didn't deal with adequately.
GWK:: It seems that this text information is simply the scripts of the
audio recordings that are suplemental to the video. Normally, these are
prepared first and could be included in the presentation. 

GWK::In terms of the audio and the video running at the same time, I assume
that the video has an audio component of itself. The two items sending out
audio may compete for system resources. I don't know about the
implementations of this.

>#8. Repositioning of captions could be handled by switch statements... --
>Unclear here. Would captions that are overlaid on video media object be
>controllable in this manner, or is the idea that user preferences should
>require captions to be displayed elsewhere than on top of a video media
>#9. Providing user control over presentation pace is complex and the
>following issues need to be considered... -- Agreed. Need to consider to
>what extent hard synchronization is required among other media objects that
>have timing relationships with the audio (or perhaps video?) that is being
>pace-controlled. Comments or suggestions from anyone else on this?
GWK:: the whole presentation should be able to be slowed down (80% of real
time) or speeded up (120% of real time). II do this I believe this would
just be done with the presentation clock. To mess with individual
components would require the SMIL presentation to be reworked.

>[Lloyd's message follows]
>Subject: Re: SMIL: draft items for UA guidelines 
>Date: Mon, 21 Sep 1998 18:15:12 +0200
>From: Lloyd Rutledge <>
>Hi Judy,
>I'm just now back from my travels, which included our meeting at the
>WOW breakfast meeting.  I was glad to see your UA guidelines email and
>to see that the consideration UA with SMIL is moving forward.  Below
>are some comments I have on the draft items.
>> 1. User should be able to identify and switch text captions of audio
>> objects on & off. [Priority one]
>> - Rationale: Some users require captions anytime they are available, in
>> which case users should be able to set a preference in the user preferences
>> to view captions; other users require captions only in certain
>> circumstances, in which case they need a mechanism to identify when
>> captions are available, and to turn them on while a presentation is
>> playing. Mechanism for identification of captions should also function
>> non-visually as some users can neither hear audio files nor see captions
>> but can still access captions through screen-reading software and
>> refreshable Braille display.
>> - Technique: Provide user interface to switch display of media objects with
>> "system-caption" test attribute on and off. This must be possible both in
>> the static user preferences, and dynamically while the presentation is
>> playing.
>This guideline applies to implementors of SMIL 1.0, which contrasts
>with guidelines that apply to the working group for the next version
>of SMIL.  It might help to distinguish between these guidelines so
>that the different audiences reading the draft can readily identify
>what applies most to them.
>> 2. User should be able to control size, color, and background color of
>> captions. [Priority one]
>> - Rationale: Some users require specific font size, color, and contrast
>> with caption background to be able to view captions.
>> - Technique: Provide user interface to change size, color, and background
>> color of media objects with "system-caption" test attribute. This must be
>> possible both in the static user preferences, and dynamically while the
>> presentation is playing.
>This is an issue of style and specifying style for particular media
>objects.  This issue is primarily in the domain of CSS.  SMIL includes
>document portions of other media formats and integrates them into a
>unified presentation.  If CSS or an equivalent does or is made to
>apply any media type then we should ensure that the SMIL format and
>the players for it are kept compatible.
>Is there a captions format with markup/structure that identifies what
>portions of the captioning are things such as
> - different speakers
> - music
> - various noises (telephone, doorbell, car horn, siren, whistle, etc.)
> - tones of voice
> - brief occurrences of different languages not necessarily intended to 
>   be understood by the audience
>Having a caption format that identifies such things abstractly rather
>than just giving their text description is important because the
>techniques for identifying them to the user can vary between
>presentations.  Style specs can then say how these abstractions should
>be presented to the user given the user's preferences.
>> 3. User should be able to identify and turn on and off audio descriptions
>> of video objects. [Priority one]
>> - Rationale: Users who cannot see a video media object need a non-visual
>> way to identify that an audio description is available.
>> - Technique: Provide a standard mechanism for notifying third party
>> assistive technologies (e.g. screen-reading software) of the existence of
>> an audio description for a video object. Provide a mechanism (which can
>> function non-visually) for turning on or off the audio description.
>This applies both to implementors and the SMIL working group.  It
>applies for the SMIL working group because the format currently does
>not explicitly address audio descriptions.
>We could introduce a "system-audio-desc" attribute that works just
>like the system-captions attribute but for audio description rather
>than captions.  This new attribute can be included in the next version
>of SMIL.  In the meantime, it can be included uniformly by
>implementors and specified as a private extension to SMIL in the
>manner described in the SMIL appendix.
>The current SMIL 1.0 time model handles the time issues related to
>audio description well.  After seeing Geoff's demo of audio
>description, which we both way at the breakfast meeting, I came up
>with the example SMIL code below
>  <par>
>    <video id="video" fill="freeze"
>           src="video.mpeg" clip-begin="npt=0s" clip-end="npt=5s"/>
>    <audio src="audio.wav"  clip-begin="npt=0s" clip-end="npt=5s"/>
>    <audio begin="id(video)(6s) system-audio-desc="on" src="desc.wav"/>
>  </par>
>  <par>
>     <video src="video.mpeg" clip-begin="npt=6s" clip-end="npt=10s">
>     <audio src="audio.wav"  clip-begin="npt=6s" clip-end="npt=10s">
>  </par>
>This presentation has a video file and an accompanying primary audio
>file that is the main soundtrack for the video.  There is also one two
>second long audio description file.  This describes a visual event
>that takes place five seconds into the video.  There is no natural
>silent period in the soundtrack into which an audio description could
>fit.  The behavior of this presentation when audio description is on
>is that the video and sound track play for five seconds, then the
>sound track stops and the video freezes for five seconds while the
>audio description plays and then the remaining five seconds of the
>video and audio play.  This scenario enables a sighted viewer and a
>non-sighted listener to watch the same presentation simultaneously,
>with the pausing video keeping the viewer engaged.  If audio
>description is turned off, the same presentation should play the full
>10 seconds of video and soundtrack audio with no perceivable pauses or
>This presentation is represented in the SMIL code above as a sequence
>of five second sub-presentations.  In each, corresponding five second
>segments of the video and soundtrack audio are played in parallel.
>The first five second segment also includes a switch statement chooses
>between playing an empty ref element, effectively doing nothing, and
>playing the audio description file describing the event that happens
>five seconds into the video.  This switch statement is timed to start
>5 seconds into the playing of the video.  If the audio description is
>not played, the switch statement does not do anything, thus ending
>immediately.  This causes the end time of the par element to be the
>same as the duration of the video clip.  In this case, the showing of
>the first video clip should play seamlessly into the second.  However,
>if audio description is on, the five second description file will
>play, bringing the duration of the first par element to 10 seconds.
>Because the video on this par is only 5 seconds long, when the video
>clip ends the fill attribute value "freeze" takes effect, causes the
>final frame of the video to stay on the screen until the audio
>description ends.
>This same technique can exploit the occurances of silent periods in
>the soundtrack.  If there is a soundtrack silent period that coincides
>with a visual event, then the audio description should play during
>that silence.  For the duration of this silent period, it is not
>necessary to freeze the video while the audio description is being
>played because the audio description would not drown out any important
>soundtrack sound.  However, if the audio description is longer than
>the silent period, you may still want to freeze the video at the end
>of the silent period.  The technique above can address this by
>sychronizing not with the very end of the video but with the time in
>the audio when the silent period begins, and clipping the audio so
>that the end of the audio of the end of the silent period.
>> 8. User should be able to reposition captions. [Priority two]
>> - Rationale: Some multi-media presentations will include positioning
>> conflicts between captions which can obscure key visual elements of video
>> media objects.
>> - Technique: Provide mechanisms to control caption display location
>> dynamically and through user preferences.
>This could be handled by switch statements that choose between
>alternate screen layout specifications whose selection is based on
>SMIL test attributes that refer to user characteristics, such as
>system-captions and the system-audio-desc attribute proposed in this
>> 9. User should be able to dynamically control rate for audio media objects.
>> [Priority two]
>> - Rationale: Users with aural-processing learning disabilities may require
>> a slower pace of audio; users who are experienced users of synthesized
>> speech may be able to tolerate a far faster pace of audio.
>> - Tecnique: Provide mechanism for dynamic control of presentation pace.
>If browsers have general presentation rate controls for the user, then
>the whole presentation could be slowed in a synchronous fashion to the
>point at which the user can comprehend the audio.  However, this may
>not always be desired.  For example, if the presentation includes
>video or non-speech audio that is not played simultaneously with
>speech, then the speech processing impaired would have no need for
>those parts of the presentation to be slowed.  Further, such rate
>control for an entire presentation is very hard to implement.
>This is another issue that style sheets apply to.  Auditory style
>sheets could be written for a user that say how much to slow down
>speech audio and at what rate to synthesize speech.
>Some degree of rate-control can be specified with a switch element
>choosing between alternative sound files whose choice is based on test
>attributes.  Test attributes that apply here are not explicitly
>defined in SMIL, but, as described earlier, there are extension
>techniques described in the document that can be used to add test
>attributes.  Perhaps an appropriate collection of test attributes for
>this issue can be determined and included in a future version of SMIL.
>Discussion of audio rate control should also address how to play other
>media objects that have timing relationships with the audio.  If the
>audio is slowed, what other media object should be slowed, and how?
>One technique for addressing this with SMIL is to use code similar to
>the example given for issue 3, above.  The audio and video media
>objects accompanying the affected audio object could be divided into
>clips, along with the affected audio object.  These clips would be
>established at key event points whose synchronicity should be
>maintained.  This would cause video clips to sometimes be frozen and
>sound clips to pause at these key points in order to maintain overall
>temporal consistency.
>Note that relational timing in SMIL is in terms of the playing time of
>the references media object, not in terms of ideal playing time or the
>playing time of the overall presentation.  For example, if a video is
>triggered to start five seconds into an audio clip, and the audio clip
>is slowed to half speed, the vidoe will actually play after 10 seconds
>real time.
>Is more continuous synchronicity needed to properly address issue 9?
>For example, if an audio file with speech is slowed down, should an
>accompanying video be slowed down at exactly the same pace so that lip
>synching is maintained?  If this is true, then hard synchronization
>would need to be implemented in all SMIL players.  Hard
>synchronization is discussed in the specification as an allowable
>alternative for handling synchronization, but it is not required for
>use in all cases.  A future version of SMIL could provide the author
>with the ability to specify what parts of a presentation should have
>hard synchronization and what should not.  If hard synchronization
>were important for using a dynamic control rate for audio media
>objects, then this need would motivate hard synchronizations inclusion
>in the next version of SMIL.
>Lloyd Rutledge                                         vox: +31 20 592 41 27
>CWI (Centrum voor Wiskunde en Informatica)             fax: +31 20 592 41 99
>Kruislaan 413,  NL-1098 SJ Amsterdam, The Netherlands  net:
>P.O. Box 94079, NL-1090 GB Amsterdam, The Netherlands
>Judy Brewer    +1.617.258.9741
>Director, Web Accessibility Initiative (WAI) International Program Office
>World Wide Web Consortium (W3C)
>MIT/LCS Room NE3-355, 545 Technology Square, Cambridge, MA, 02139, USA
George Kerscher, Project Manager
PM to the DAISY Consortium
Recording For the Blind & Dyslexic
Phone: 406/549-4687

Received on Wednesday, 30 September 1998 16:49:37 UTC