W3C home > Mailing lists > Public > www-smil@w3.org > October to December 2000

Re: Synthesized-speech auditory descriptions

From: Brad Botkin <brad_botkin@wgbh.org>
Date: Tue, 07 Nov 2000 07:13:29 -0500
Message-ID: <3A07F1E9.54FA42C7@wgbh.org>
To: "Cohen, Aaron M" <aaron.m.cohen@intel.com>
CC: "Hansen, Eric" <ehansen@ets.org>, "'symm@w3.org'" <symm@w3.org>, geoff freed <geoff_freed@wgbh.org>, www-smil@w3.org, thierry michel <tmichel@w3.org>, www-smil-request@w3.org
Aaron,
I apologize for being such a pest about this.  The problem with

	> > ... an implementation can provide rendering this
	> > (<text> element) text via voice synthesis

is that in order for player and browser developers to incorporate accessibility features such as
captioning, audio description, tts-audio desc, etc., they need to be *GUARANTEED* that the data
lives in a particular place. That is, it must be unambiguous to the parser that it's picking up the
desired source data.  The access data is very specific metadata, which can't *MAYBE* live in the
<text> element, *MAYBE* live in the <alt> element, *MAYBE* live in the <longdesc> element.  Most
metadata is simply embellishment.  Access metadata *IS* the data, just in another format.  You
supply a videoregion and a src=..., precisely so that the display engine knows what to play.  You
could just as easily say 

	"maybe the media filename can live in the <alt> tag. 
	Sometimes it will, sometimes it won't, good luck."  

Accessibility in SMIL is not about creating spots in SMIL for just another pretty presentation
element. It's about allowing the essence of the presentation to be found and rendered.

I understand that it may be late in the SMIL 2.0 game to be talking about any additional
accessibility-specific markup, but it's my opinion that the need is immediate, accessibility in SMIL
is not ready for primetime, and SMIL 2.0 can go forward without it but it will need to be raised
immediately in the next round, with additional specific markup.  That SMIL is media-agnostic is
necessary but not sufficient for rational implementation of accessibility.

--Brad
\_\_\_\_\_\_\_\_\_\_\_
Brad_Botkin@wgbh.org   Director, Technology & Systems Development
(v/f) 617.300.3902               NCAM/WGBH - National Center for 
125 Western Ave Boston MA 02134              Accessible Media
\_\_\_\_\_\_\_\_\_\_\_


"Cohen, Aaron M" wrote:
> 
> Eric:
> I don't interpret the guidelines the way that you do. It seems that you
> assume that alt and longdesc cannot be rendered by synthesized speech. Also,
> we have a <text> element, and an implementation can provide rendering this
> text via voice synthesis.
> 
> Where we seem to differ is that it seems that your preference is for
> specialized synthesized speech markup, where I think that much of what we
> already have can be used.
> 
> The exploratory comments that I made were in relation to specialized support
> for synthesized speech, not to say that there is no way to incorporate
> synthetic speech into a smil presentation.
> 
> Here is how I answer these specific questions:
> > 1. Does SMIL 2.0 support the development of synthesized
> > speech auditory
> > descriptions?
> Yes. SMIL 2.0, like SMIL 1.0, is media agnostic. Any type of media can be
> supported in SMIL. It is up to the implementation to provide rendering for
> the supported media types, and alternative rendering methods to enhance
> accessibility.
> 
> > SMIL does not currently support synthesized speech auditory
> > descriptions. It
> > does support prerecorded auditory descriptions.
> This is not so. SMIL has exactly the same support for synthesized speech
> auditory descriptions as it does for pre-recorded auditory descriptions.
> SMIL is a media integration language, and does not define media itself.
> 
> The text that you quote does not call out synthetic speech specifically, but
> it is not excluded.
> 
> > 2. If the answer to question #1 is "Yes", then briefly
> > describe the support
> > that is provided.
> 1. A user agent can render alt/longdesc as synthesized speech.
> 2. A user agent can provide a synthetic speech renderer for <text> media
> elements.
> 3. A user can control the rendered media via system preferences which map to
> system test attributes. This allows the author to set the synthesized speech
> up as captions or overdub or audio descriptions.
> 4. SMIL 2.0 has author defined customTest attributes, to allow turning
> on/off media based on document and user specific criteria.
> 
> -Aaron
> 
> > -----Original Message-----
> > From: Hansen, Eric [mailto:ehansen@ets.org]
> > Sent: Wednesday, November 01, 2000 8:14 AM
> > To: 'Cohen, Aaron M'; 'Brad Botkin'
> > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric; www-smil@w3.org; thierry
> > michel; www-smil-request@w3.org
> > Subject: RE: Synthesized-speech auditory descriptions
> >
> >
> > I have an additional comment and then I will summarize.
> >
> > SOME BASIC REQUIREMENTS FOR MULTIMEDIA PRESENTATIONS
> >
> > From the glossary entry for the term "Equivalent" in the W3C
> > Web Content
> > Accessibility Guidelines (WCAG) 1.0 [3], we see that
> > regarding multimedia
> > presentations that are three major forms of equivalent:
> > captions, auditory
> > descriptions, and collated text transcript.
> >
> > "A caption is a text transcript for the audio track of a
> > video presentation
> > that is synchronized with the video and audio tracks.
> > Captions are generally
> > rendered visually by being superimposed over the video, which benefits
> > people who are deaf and hard-of-hearing, and anyone who
> > cannot hear the
> > audio (e.g., when in a crowded room). A collated text
> > transcript combines
> > (collates) captions with text descriptions of video information
> > (descriptions of the actions, body language, graphics, and
> > scene changes of
> > the video track). These text equivalents make presentations
> > accessible to
> > people who are deaf-blind and to people who cannot play
> > movies, animations,
> > etc. It also makes the information available to search engines.
> >
> > "One example of a non-text equivalent is an auditory
> > description of the key
> > visual elements of a presentation. The description is either
> > a prerecorded
> > human voice or a synthesized voice (recorded or generated on
> > the fly). The
> > auditory description is synchronized with the audio track of the
> > presentation, usually during natural pauses in the audio
> > track. Auditory
> > descriptions include information about actions, body
> > language, graphics, and
> > scene changes."
> >
> > See
> > It appears that SMIL 2.0 provides support for captions and prerecorded
> > auditory descriptions but not for synthesized speech auditory
> > descriptions
> > or collated text transcripts. I have already pointed out the
> > importance of
> > synthesized speech auditory descriptions (see WCAG 1.0 checkpoint
> >
> >
> > 1.1 Provide a text equivalent for every non-text element
> > (e.g., via "alt",
> > "longdesc", or in element content). This includes: images, graphical
> > representations of text (including symbols), image map
> > regions, animations
> > (e.g., animated GIFs), applets and programmatic objects,
> > ascii art, frames,
> > scripts, images used as list bullets, spacers, graphical
> > buttons, sounds
> > (played with or without user interaction), stand-alone audio
> > files, audio
> > tracks of video, and video. [Priority 1]
> > For example, in HTML:
> > Use "alt" for the IMG, INPUT, and APPLET elements, or provide a text
> > equivalent in the content of the OBJECT and APPLET elements.
> > For complex content (e.g., a chart) where the "alt" text does
> > not provide a
> > complete text equivalent, provide an additional description using, for
> > example, "longdesc" with IMG or FRAME, a link inside an
> > OBJECT element, or a
> > description link.
> > For image maps, either use the "alt" attribute with AREA, or
> > use the MAP
> > element with A elements (and other text) as content.
> > Refer also to checkpoint 9.1 and checkpoint 13.10.
> >
> > Techniques for checkpoint 1.1
> > 1.3 Until user agents can automatically read aloud the text
> > equivalent of a
> > visual track, provide an auditory description of the
> > important information
> > of the visual track of a multimedia presentation. [Priority 1]
> > Synchronize the auditory description with the audio track as
> > per checkpoint
> > 1.4. Refer to checkpoint 1.1 for information about textual
> > equivalents for
> > visual information.
> > Techniques for checkpoint 1.3
> > 1.4 For any time-based multimedia presentation (e.g., a movie
> > or animation),
> > synchronize equivalent alternatives (e.g., captions or
> > auditory descriptions
> > of the visual track) with the presentation. [Priority 1]
> >
> >
> > I am trying to summarize what has been said to this point on
> > this thread
> > that responds to my earlier questions [1]
> >
> > SUMMARY
> >
> > 1. Does SMIL 2.0 support the development of synthesized
> > speech auditory
> > descriptions?
> >
> > SMIL does not currently support synthesized speech auditory
> > descriptions. It
> > does support prerecorded auditory descriptions.
> >
> > 2. If the answer to question #1 is "Yes", then briefly
> > describe the support
> > that is provided.
> >
> > N/A
> >
> > 3. If the answer to question #1 is "No", then please describe
> > any plans for
> > providing such support in the future.
> >
> > There are currently no plans for including this in SMIL. Aaron Cohen
> > suggests that "Probably what is needed is a general
> > accessible markup that
> > can be used in SMIL, XHTML, SVG, etc. SMIL would just adopt this as a
> > content type. This new content type could be designed to
> > resuse a lot of
> > SMIL content control, and it could have additional
> > indirection mechanisms to
> > enable the kind of structured grouping that you mention. But
> > that's another
> > spec, and for now the vendors are doing their own thing." [2]
> >
> > ====
> >
> > COMMENT
> >
> > It seems to me that if SMIL 2.0 proceeds to Recommendation
> > status, it would
> > be good to have done several things.
> >
> > 1. Affirm W3C's commitment to suppporting Web accessbility,
> > particularly the
> > multimedia-related requirements of the Web Content
> > Accessibility Guidelines
> > (WCAG), User Agent Accessibility Guidelines (UAAG), Authoring Tool
> > Accessibility Guidelines (ATAG). Captions, auditory descriptions, and
> > collated text transcripts stand out in my mind in this
> > regard. (See WCAG 1.0
> > [3]).
> >
> > 2. Explain why synthesized speech auditory descriptions are
> > not or cannot be
> > part of the SMIL 2.0 specification.
> >
> > 3. Suggest a plan for supporting synthesized speech auditory
> > descriptions. I
> > personally would like to see some kind commitment from the
> > W3C to support
> > this, either as part of the next version of SMIL or perhaps
> > as Aaron has
> > suggested, another specification that could be reused by
> > SMIL, XHTML, SVG,
> > etc.
> >
> > 4. Suggest techniques for providing such auditory
> > descriptions and collated
> > text transcripts  until they are fully integrated into W3C
> > specifications.
> >
> > I think that it would be appropriate to have at least a
> > summary of such
> > information as part of the Recommendation. I am concerned
> > that without such
> > information within the document, people may doubt the W3C's
> > commitment to
> > accessible media.
> >
> >
> >
> > [1] http://lists.w3.org/Archives/Public/www-smil/2000OctDec/0050.html
> > [2] http://lists.w3.org/Archives/Public/www-smil/2000OctDec/0062.html
> > [3] http://www.w3.org/TR/WAI-WEBCONTENT/
> >
> > -----Original Message-----
> > From: Cohen, Aaron M [mailto:aaron.m.cohen@intel.com]
> > Sent: Monday, October 30, 2000 12:40 PM
> > To: 'Brad Botkin'
> > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric; www-smil@w3.org; thierry
> > michel; www-smil-request@w3.org
> > Subject: RE: Synthesized-speech auditory descriptions
> >
> >
> > Brad:
> > As far as the systemAudioDesc only taking on/off, that's
> > true, but you can
> > combine it with the other test attributes, such as
> > systemLanguage, and get
> > many, many combinations. Geoff Freed and the WAI people are
> > reviewing those
> > combinations for completeness, so if you think that we are missing a
> > specific use case, please let us know.
> >
> > As far as separate text files for accessibility documents,
> > you are right,
> > that's a thorny issue for SMIL, which has left the definition
> > of media (as
> > opposed to the integration) to the player/content developers.
> >
> > Probably what is needed is a general accessible markup that
> > can be used in
> > SMIL, XHTML, SVG, etc. SMIL would just adopt this as a
> > content type. This
> > new content type could be designed to resuse a lot of SMIL
> > content control,
> > and it could have additional indirection mechanisms to enable
> > the kind of
> > structured grouping that you mention. But that's another
> > spec, and for now
> > the vendors are doing their own thing.
> >
> > -Aaron
> >
> > > -----Original Message-----
> > > From: Brad Botkin [mailto:Brad_Botkin@wgbh.org]
> > > Sent: Sunday, October 29, 2000 4:33 AM
> > > To: Cohen, Aaron M
> > > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric;
> > www-smil@w3.org; thierry
> > > michel; www-smil-request@w3.org
> > > Subject: Re: Synthesized-speech auditory descriptions
> > >
> > >
> > > Aaron,
> > > What seems to be missing from
> > >
> > > > > <par>
> > > > >         <audio src="snippet8043.wav">
> > > > >                 <description xml:lang="en">
> > > > >                         The lady in the pink sweater
> > > picks up the pearl
> > > > > necklace from the table and walks to the door.
> > > > >                 <description/>
> > > > >                 <description xml:lang="fr">
> > > > >                         Oui.
> > > > >                 <description/>
> > > > >         </audio>
> > > > > /par>
> > >
> > > is a way to uniquely and unambiguously identify the text
> > above as the
> > > audio description (unless the <description> tag is just that, but I
> > > assume "<description xml....>" here is a generic term unrelated to
> > > "audio description" as we're talking about it).
> > >
> > > The <systemAudioDesc> tag is a way to signal a player that some
> > > particular content should be played for some users.  But
> > the specific
> > > rendering device has the job of deciding which media
> > element to play,
> > > the audio (uniquely identified by the "src" attribute) or the
> > > transcription of that element (not yet uniquely identified).
> > >
> > > The point is that there may be more than just one text string
> > > associated
> > > with an audio element, only one of which is the
> > transcription of that
> > > audio.  <systemAudioDesc> *almost* spoke to this need,
> > except that it
> > > only takes an "on/off" value, which seems insufficient to
> > the task of
> > > allowing rendering engines to adequately handle
> > accessibility issues.
> > > Since accessibility is being legislated in the tv and
> > multimedia arena
> > > as we speak, it seems prudent to create a set of extensible
> > > accessibility tags which will allow those industries to
> > easily utilize
> > > SMIL in their workflow.  It's true that these elements would not be
> > > general, reuseable ones, and I sympathize with your reticence to
> > > generate more case markup. Nonetheless....
> > >
> > > In another vein, how about the issue of how to manage the
> > grouping of
> > > synched accessibility objects (captions and descriptions,
> > for example)
> > > in separate text files.  I'm sure this is thorny, but the current
> > > existing formats (RealText, SAMI, Quicktime qtText) all
> > offer a way to
> > > group these related elements (for captioning).  Current thoughts?
> > >
> > > --Brad
> > > \_\_\_\_\_\_\_\_\_\_\_
> > > Brad_Botkin@wgbh.org   Director, Technology & Systems Development
> > > (v/f) 617.300.3902               NCAM/WGBH - National Center for
> > > 125 Western Ave Boston MA 02134              Accessible Media
> > > \_\_\_\_\_\_\_\_\_\_\_
> > >
> > >
> > >
> > > "Cohen, Aaron M" wrote:
> > > >
> > > > Brad:
> > > > That specific use of verbatim text is what systemAudioDesc
> > > is for. It can be
> > > > used on text media elements that can contain the verbatim
> > > text. The pair of
> > > > audio and text elements can be wrapped in a par and given a
> > > specific title,
> > > > and the unit used in a presentation just like an individual
> > > media element.
> > > >
> > > > Why would it be better to have special case markup when the
> > > generalized
> > > > capabilities that we have cover the use cases?
> > > >
> > > > Your example confuses me, since it doesn't seem to give any
> > > more capability
> > > > than we already have with XHTML+SMIL:
> > > >
> > > > <par>
> > > >         <audio src="snippet8043.wav"/>
> > > >         <p systemAudioDesc="on">The lady in the pink
> > > sweater picks up the
> > > > pearl necklace from the table and walks to the door.</p>
> > > > </par>
> > > >
> > > > Even less, since you can't hang an xml:lang off the
> > > attribute, necessitating
> > > > duplication of the media object reference for each langauge
> > > of the text
> > > > description.
> > > >
> > > > With SMIL 2.0, you have to put the text in alt or another
> > > file, because SMIL
> > > > does not itself define media:
> > > > <par>
> > > >         <audio src="snippet8043.wav"/>
> > > >         <text systemAudioDesc="on" src="lady.txt/>
> > > > /par>
> > > >
> > > > If you are saying that there should be some general
> > > scalable mechanism to
> > > > make this easier to maintain, I agree with you, with the
> > additional
> > > > stipulation that this is not just a smil issue, but an
> > > issue for all XML
> > > > languages that have non-text content.
> > > >
> > > > For the next version of SMIL, we plan to adopt SVG's
> > > description element,
> > > > which would allow you to do something like this in SMIL:
> > > >
> > > > <par>
> > > >         <audio src="snippet8043.wav">
> > > >                 <description xml:lang="en">
> > > >                         The lady in the pink sweater picks
> > > up the pearl
> > > > necklace from the table and walks to the door.
> > > >                 <description/>
> > > >                 <description xml:lang="fr">
> > > >                         Oui.
> > > >                 <description/>
> > > >         </audio>
> > > > /par>
> > > >
> > > > Having an attribute on elements that are specially meant to
> > > be a literal
> > > > text translation of (possibly long) media does not scale
> > > well. The sub
> > > > elements make more sense.
> > > >
> > > > I think that this is the beginning of discussion about the
> > > need to create a
> > > > set of reusable markup elements that fit the indentified
> > > needs. I can
> > > > imagine <description>, <transcription>, and <title> child
> > > elements, all
> > > > enclosing text.
> > > >
> > > > My point is that these are real problems that need
> > > solutions, but the
> > > > solutions need to be general, reusable and thought out in
> > > detail. This will
> > > > require some dedicated people and some time. This is way
> > > too late in the
> > > > SMIL 2.0 process to start integrating this kind of thing
> > > into the language,
> > > > but it is something that should be done for re-use by everyone and
> > > > integrated into SMIL (and XHTML 2.0?, SVG?) in the future.
> > > >
> > > > -Aaron
> > > >
> > > > > -----Original Message-----
> > > > > From: Brad Botkin [mailto:brad_botkin@wgbh.org]
> > > > > Sent: Friday, October 27, 2000 12:30 PM
> > > > > To: Cohen, Aaron M
> > > > > Cc: geoff freed; Hansen, Eric; www-smil@w3.org; thierry michel;
> > > > > www-smil-request@w3.org
> > > > > Subject: Re: Synthesized-speech auditory descriptions
> > > > >
> > > > >
> > > > > Aaron,
> > > > > I think the actual transcription of the audio deserves
> > > its own tag,
> > > > > since it's so specific. For the same reason that you created a
> > > > > systemAudioDesc tag and didn't just use the alt tag.  You
> > > need a place
> > > > > to look that's consistent.  I believe the longdesc is
> > > intented to be
> > > > > used as simply a longer text description of the
> > > unnderlying graphic or
> > > > > media file. And in the case of audio description snippets,
> > > > > the longdesc
> > > > > could be used to hold timing or other metadata related to
> > > the snippet
> > > > > but not specifically voiced. I think that verbatim text
> > will prove
> > > > > invaluable in the future, for searching, etc., and you
> > > should consider
> > > > > creating a specific tag for this.
> > > > > --Brad
> > > > > \_\_\_\_\_\_\_\_\_\_
> > > > > Brad_Botkin@wgbh.org   Director, Technology & Systems
> > Development
> > > > > 617.300.3902 (v/f)               NCAM/WGBH - National Center for
> > > > > 125 Western Ave Boston MA 02134              Accessible Media
> > > > > \_\_\_\_\_\_\_\_\_\_
> > > > >
> > > > >
> > > > > "Cohen, Aaron M" wrote:
> > > > > >
> > > > > > Brad:
> > > > > >
> > > > > > We also have alt and longdesc, either of which could be
> > > > > used by a player to
> > > > > > provide accessory or alternative text content. These can be
> > > > > combined with
> > > > > > the systemLanguage and other test attributes to provide
> > > > > many combinations of
> > > > > > accessiblity and internationalization.
> > > > > > -Aaron
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Brad Botkin [mailto:brad_botkin@wgbh.org]
> > > > > > > Sent: Friday, October 27, 2000 5:41 AM
> > > > > > > To: geoff freed
> > > > > > > Cc: Hansen, Eric; www-smil@w3.org; thierry michel;
> > > > > > > www-smil-request@w3.org
> > > > > > > Subject: Re: Synthesized-speech auditory descriptions
> > > > > > >
> > > > > > >
> > > > > > > Geoff,
> > > > > > > True but incomplete.  It sounds like Eric is asking
> > for a tag
> > > > > > > which identifies text as a transcription of the underlying
> > > > > > > audio.   Something like:
> > > > > > >
> > > > > > > <par>
> > > > > > > .....
> > > > > > >     <audio    systemAudioDesc="on"
> > > > > > >                     AudioDescText="The lady in the pink
> > > > > > > sweater picks up the pearl necklace from the table and
> > > > > walks to the
> > > > > > > door."
> > > > > > >                     src="snippet8043.wav"/>
> > > > > > > .....
> > > > > > > </par>
> > > > > > >
> > > > > > > It's a great idea, since the text is super-thin, making it
> > > > > > > appropriate for transmission in narrow pipes with local
> > > > > > > text-to-speech synthesis for playback.  Note that the volume
> > > > > > > of snippets in a longer piece, like a movie, is huge, just
> > > > > > > like closed captions.  Inclusion of 1000 audio description
> > > > > > > snippets and 2000 closed captions, each in 3 languages, each
> > > > > > > with its own timecode, all in the same SMIL file will make
> > > > > > > for some *very* unfriendly  files.  Better would be
> > > to provide a
> > > > > > > mechanism which allows the SMIL file to gracefully point to
> > > > > > > separate files each containing the timecoded AD
> > snippets (with
> > > > > > > transcriptions per the above) and timecoded captions.  It
> > > > > > > requires the SMIL player to gracefully overlay the external
> > > > > > > timeline onto the intrinsic timeline of the SMIL file.
> > > > > > > Without this, SMIL won't be used for interchange of
> > > caption and
> > > > > > > description data for anything longer than a minute
> > or two.  A
> > > > > > > translation house shouldn't have to unwind a bazillion audio
> > > > > > > descriptions and captions in umpteen other languages to
> > > > > > > insert its French translation.
> > > > > > >
> > > > > > > Regards,
> > > > > > > --Brad
> > > > > > > \_\_\_\_\_\_\_\_\_\_\_
> > > > > > > Brad_Botkin@wgbh.org   Director, Technology & Systems
> > > Development
> > > > > > > (v/f) 617.300.3902               NCAM/WGBH - National
> > > Center for
> > > > > > > 125 Western Ave Boston MA 02134
> > Accessible Media
> > > > > > > \_\_\_\_\_\_\_\_\_\_\_
> > > > > > >
> > > > > > >
> > > > > > > geoff freed wrote:
> > > > > > >
> > > > > > > > Hi, Eric:
> > > > > > > >
> > > > > > > > SMIL 2.0 provides support for audio descriptions
> > via a test
> > > > > > > attribute, systemAudioDesc.  The author can record audio
> > > > > > > >  descriptions digitally and synchronize them into a SMIL
> > > > > > > presentation using this attribute, similar to how
> > captions are
> > > > > > > >  synchronized into SMIl presentations using systemCaptions
> > > > > > > (or system-captions, as it is called in SMIL 1.0).
> > > > > > > >
> > > > > > > > Additionally, using SMIL2.0's <excl> and <priorityClass>
> > > > > > > elements, the the author may pause a video track
> > > > > > > >  automatically, play an extended audio description and,
> > > > > > > when the description is finished, resume playing the video
> > > > > > > >  track.  This will be a boon for situations  where the
> > > > > > > natural pauses in the program audio aren't sufficient
> > > for audio
> > > > > > > >  descriptions.
> > > > > > > >
> > > > > > > > Geoff Freed
> > > > > > > > CPB/WGBH National Center for Accessible Media (NCAM)
> > > > > > > > WGBH Educational Foundation
> > > > > > > > geoff_freed@wgbh.org
> > > > > > > >
> > > > > > > > On Wednesday, October 25, 2000, thierry michel
> > > > > > > <tmichel@w3.org> wrote:
> > > > > > > > >
> > > > > > > > >> My questions concern the use of SMIL for developing
> > > > > > > auditory descriptions
> > > > > > > > >> for multimedia presentations.
> > > > > > > > >>
> > > > > > > > >> The Web Content Accessibility Guidelines (WCAG) version
> > > > > > > 1.0 of W3C/WAI
> > > > > > > > >> indicates the possibility of using speech synthesis for
> > > > > > > providing auditory
> > > > > > > > >> descriptions for multimedia presentations.
> > Specifically,
> > > > > > > checkpoint 1.3 of
> > > > > > > > >> WCAG 1.0 reads:
> > > > > > > > >>
> > > > > > > > >> "1.3 Until user agents can automatically read aloud the
> > > > > > > text equivalent of
> > > > > > > > >a
> > > > > > > > >> visual track, provide an auditory description of the
> > > > > > > important information
> > > > > > > > >> of the visual track of a multimedia presentation.
> > > > > [Priority 1]
> > > > > > > > >> Synchronize the auditory description with the audio
> > > > > track as per
> > > > > > > > >checkpoint
> > > > > > > > >> 1.4. Refer to checkpoint 1.1 for information about
> > > > > > > textual equivalents for
> > > > > > > > >> visual information." (WCAG 1.0, checkpoint 1.3).
> > > > > > > > >>
> > > > > > > > >> In the same document in the definition of
> > > > > "Equivalent", we read:
> > > > > > > > >>
> > > > > > > > >> "One example of a non-text equivalent is an auditory
> > > > > > > description of the
> > > > > > > > >key
> > > > > > > > >> visual elements of a presentation. The description is
> > > > > > > either a prerecorded
> > > > > > > > >> human voice or a synthesized voice (recorded or
> > > > > > > generated on the fly). The
> > > > > > > > >> auditory description is synchronized with the audio
> > > > > track of the
> > > > > > > > >> presentation, usually during natural pauses in
> > the audio
> > > > > > > track. Auditory
> > > > > > > > >> descriptions include information about actions, body
> > > > > > > language, graphics,
> > > > > > > > >and
> > > > > > > > >> scene changes."
> > > > > > > > >>
> > > > > > > > >> My questions are as follows:
> > > > > > > > >>
> > > > > > > > >> 1. Does SMIL 2.0 support the development of synthesized
> > > > > > > speech auditory
> > > > > > > > >> descriptions?
> > > > > > > > >>
> > > > > > > > >> 2. If the answer to question #1 is "Yes", then briefly
> > > > > > > describe the
> > > > > > > > >support
> > > > > > > > >> that is provided.
> > > > > > > > >>
> > > > > > > > >> 3. If the answer to question #1 is "No", then please
> > > > > > > describe any plans
> > > > > > > > >for
> > > > > > > > >> providing such support in the future.
> > > > > > > > >>
> > > > > > > > >> Thanks very much for your consideration.
> > > > > > > > >>
> > > > > > > > >> - Eric G. Hansen
> > > > > > > > >> Development Scientist
> > > > > > > > >> Educational Testing Service (ETS)
> > > > > > > > >> Princeton, NJ 08541
> > > > > > > > >> ehansen@ets.org
> > > > > > > > >> Co-Editor, W3C/WAI User Agent Accessibility Guidelines
> > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> >
Received on Tuesday, 7 November 2000 07:14:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:53:27 GMT