- From: Brad Botkin <brad_botkin@wgbh.org>
- Date: Tue, 07 Nov 2000 15:12:04 -0500
- To: "Cohen, Aaron M" <aaron.m.cohen@intel.com>
- CC: geoff freed <geoff_freed@wgbh.org>, "Hansen, Eric" <ehansen@ets.org>, "'symm@w3.org'" <symm@w3.org>, www-smil@w3.org, thierry michel <tmichel@w3.org>, www-smil-request@w3.org
Aaron, Fair enough. Let's table it until the next round's requirements start. I agree that there's simply a hole to fill, and I didn't mean to single out SMIL. But SMIL may be the best place to fill *all* the hole left by other languages. --Brad \_\_\_\_\_\_\_\_\_\_ Brad_Botkin@wgbh.org Director, Technology & Systems Development 617.300.3902 (v/f) NCAM/WGBH - National Center for 125 Western Ave Boston MA 02134 Accessible Media \_\_\_\_\_\_\_\_\_\_ "Cohen, Aaron M" wrote: > > Brad: > > I'm not trying to keep going round and round about this, and I do understand > what you are saying, but I think that you are missing what I am saying. > Simply (but lengthly, please forgive me): > > 1. SMIL 2.0 supports the current standard set of accessibility attributes on > _all_ elements. These are the attributes recommended by the WAI team and > used in other W3C recommendations. > > 2. SMIL 2.0, like SMIL 1.0, is otherwise media agnostic. It is an author's > responsibility to include the media that they make available to users, and > alternatives to that media. This is a different means of creating accessible > presentations than fixed attributes, but it is more flexible and integrates > well with SMIL. > > 3. SMIL 1.0's accessibility was considered an improvement by WAI at the > time, and SMIL 2.0 provides even more accessibility. We have responded to, > and included, most of the features that WAI requested prior to last call. > > 4. The issue that you bring up, that synthesized-speech text equivalents > needs to live in a specific, guarenteed place, is not specifically a smil > issue. SMIL does support this, but not in the particular manner that you > request. This is where I suggested that some general markup be developed to > be reused in multiple languages. Much like alt and longdesc are now. > > 5. As you imply, requiring specific media support such as suggested in #3 > and beyond #1 and #2 is not an appropriate new topic at this stage. It _is_ > appropriate during a requirements gathering phrase or as comments when > working drafts are released. Certainly, this is a valid issue to consider > for the next version. > > 6. Granted that synthesized-speech text equivalents are worth considering, > it's not absolutely clear to me that these require/deserve their own > special-use attributes and elements, as opposed to alt and longdesc. > Synthesized-speech can be considered just a rendering method. Selecting > exactly what content needs to be rendered may best be decided by authors > providing the media to a user, and the user agent providing the user with > options. Such as an option to "render text media with synthesized-speech", > or "render the alt text with synthesized-speech". It's not clear to me why > the synthesized speech text data should be at the same basic level as the > alt and longdesc attributes are, but I do think that the topic warrants > discussion and some conscientious design, not just a quick fix. > > 7. Saying that SMIL 2.0 accessibility is not ready for primetime using an > argument that applies to all W3C languages (XHTML, SVG, etc.) means that the > W3C has a hole to fill, not that SMIL is lacking the current standard of > support or that the SYMM working group did not take into account design > requirements. These other languages are not going to be delayed for > synthesized speech, and neither should smil. > > What I suggest is that the WAI team and other interested parties come to > some consensus on what kind of support for synthesized-speech is necessary, > and SYMM will strongly consider this recommendation for the next version. > Other WG's should be informed as well, since the same issues apply to them. > > -Aaron > > > -----Original Message----- > > From: Brad Botkin [mailto:brad_botkin@wgbh.org] > > Sent: Tuesday, November 07, 2000 4:13 AM > > To: Cohen, Aaron M > > Cc: Hansen, Eric; 'symm@w3.org'; geoff freed; www-smil@w3.org; thierry > > michel; www-smil-request@w3.org > > Subject: Re: Synthesized-speech auditory descriptions > > > > > > Aaron, > > I apologize for being such a pest about this. The problem with > > > > > > ... an implementation can provide rendering this > > > > (<text> element) text via voice synthesis > > > > is that in order for player and browser developers to > > incorporate accessibility features such as > > captioning, audio description, tts-audio desc, etc., they > > need to be *GUARANTEED* that the data > > lives in a particular place. That is, it must be unambiguous > > to the parser that it's picking up the > > desired source data. The access data is very specific > > metadata, which can't *MAYBE* live in the > > <text> element, *MAYBE* live in the <alt> element, *MAYBE* > > live in the <longdesc> element. Most > > metadata is simply embellishment. Access metadata *IS* the > > data, just in another format. You > > supply a videoregion and a src=..., precisely so that the > > display engine knows what to play. You > > could just as easily say > > > > "maybe the media filename can live in the <alt> tag. > > Sometimes it will, sometimes it won't, good luck." > > > > Accessibility in SMIL is not about creating spots in SMIL for > > just another pretty presentation > > element. It's about allowing the essence of the presentation > > to be found and rendered. > > > > I understand that it may be late in the SMIL 2.0 game to be > > talking about any additional > > accessibility-specific markup, but it's my opinion that the > > need is immediate, accessibility in SMIL > > is not ready for primetime, and SMIL 2.0 can go forward > > without it but it will need to be raised > > immediately in the next round, with additional specific > > markup. That SMIL is media-agnostic is > > necessary but not sufficient for rational implementation of > > accessibility. > > > > --Brad > > \_\_\_\_\_\_\_\_\_\_\_ > > Brad_Botkin@wgbh.org Director, Technology & Systems Development > > (v/f) 617.300.3902 NCAM/WGBH - National Center for > > 125 Western Ave Boston MA 02134 Accessible Media > > \_\_\_\_\_\_\_\_\_\_\_ > > > > > > "Cohen, Aaron M" wrote: > > > > > > Eric: > > > I don't interpret the guidelines the way that you do. It > > seems that you > > > assume that alt and longdesc cannot be rendered by > > synthesized speech. Also, > > > we have a <text> element, and an implementation can provide > > rendering this > > > text via voice synthesis. > > > > > > Where we seem to differ is that it seems that your preference is for > > > specialized synthesized speech markup, where I think that > > much of what we > > > already have can be used. > > > > > > The exploratory comments that I made were in relation to > > specialized support > > > for synthesized speech, not to say that there is no way to > > incorporate > > > synthetic speech into a smil presentation. > > > > > > Here is how I answer these specific questions: > > > > 1. Does SMIL 2.0 support the development of synthesized > > > > speech auditory > > > > descriptions? > > > Yes. SMIL 2.0, like SMIL 1.0, is media agnostic. Any type > > of media can be > > > supported in SMIL. It is up to the implementation to > > provide rendering for > > > the supported media types, and alternative rendering > > methods to enhance > > > accessibility. > > > > > > > SMIL does not currently support synthesized speech auditory > > > > descriptions. It > > > > does support prerecorded auditory descriptions. > > > This is not so. SMIL has exactly the same support for > > synthesized speech > > > auditory descriptions as it does for pre-recorded auditory > > descriptions. > > > SMIL is a media integration language, and does not define > > media itself. > > > > > > The text that you quote does not call out synthetic speech > > specifically, but > > > it is not excluded. > > > > > > > 2. If the answer to question #1 is "Yes", then briefly > > > > describe the support > > > > that is provided. > > > 1. A user agent can render alt/longdesc as synthesized speech. > > > 2. A user agent can provide a synthetic speech renderer for > > <text> media > > > elements. > > > 3. A user can control the rendered media via system > > preferences which map to > > > system test attributes. This allows the author to set the > > synthesized speech > > > up as captions or overdub or audio descriptions. > > > 4. SMIL 2.0 has author defined customTest attributes, to > > allow turning > > > on/off media based on document and user specific criteria. > > > > > > -Aaron > > > > > > > -----Original Message----- > > > > From: Hansen, Eric [mailto:ehansen@ets.org] > > > > Sent: Wednesday, November 01, 2000 8:14 AM > > > > To: 'Cohen, Aaron M'; 'Brad Botkin' > > > > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric; > > www-smil@w3.org; thierry > > > > michel; www-smil-request@w3.org > > > > Subject: RE: Synthesized-speech auditory descriptions > > > > > > > > > > > > I have an additional comment and then I will summarize. > > > > > > > > SOME BASIC REQUIREMENTS FOR MULTIMEDIA PRESENTATIONS > > > > > > > > From the glossary entry for the term "Equivalent" in the W3C > > > > Web Content > > > > Accessibility Guidelines (WCAG) 1.0 [3], we see that > > > > regarding multimedia > > > > presentations that are three major forms of equivalent: > > > > captions, auditory > > > > descriptions, and collated text transcript. > > > > > > > > "A caption is a text transcript for the audio track of a > > > > video presentation > > > > that is synchronized with the video and audio tracks. > > > > Captions are generally > > > > rendered visually by being superimposed over the video, > > which benefits > > > > people who are deaf and hard-of-hearing, and anyone who > > > > cannot hear the > > > > audio (e.g., when in a crowded room). A collated text > > > > transcript combines > > > > (collates) captions with text descriptions of video information > > > > (descriptions of the actions, body language, graphics, and > > > > scene changes of > > > > the video track). These text equivalents make presentations > > > > accessible to > > > > people who are deaf-blind and to people who cannot play > > > > movies, animations, > > > > etc. It also makes the information available to search engines. > > > > > > > > "One example of a non-text equivalent is an auditory > > > > description of the key > > > > visual elements of a presentation. The description is either > > > > a prerecorded > > > > human voice or a synthesized voice (recorded or generated on > > > > the fly). The > > > > auditory description is synchronized with the audio track of the > > > > presentation, usually during natural pauses in the audio > > > > track. Au ditory > > > > descriptions include information about actions, body > > > > language, graphics, and > > > > scene changes." > > > > > > > > See > > > > It appears that SMIL 2.0 provides support for captions > > and prerecorded > > > > auditory descriptions but not for synthesized speech auditory > > > > descriptions > > > > or collated text transcripts. I have already pointed out the > > > > importance of > > > > synthesized speech auditory descriptions (see WCAG 1.0 checkpoint > > > > > > > > > > > > 1.1 Provide a text equivalent for every non-text element > > > > (e.g., via "alt", > > > > "longdesc", or in element content). This includes: > > images, graphical > > > > representations of text (including symbols), image map > > > > regions, animations > > > > (e.g., animated GIFs), applets and programmatic objects, > > > > ascii art, frames, > > > > scripts, images used as list bullets, spacers, graphical > > > > buttons, sounds > > > > (played with or without user interaction), stand-alone audio > > > > files, audio > > > > tracks of video, and video. [Priority 1] > > > > For example, in HTML: > > > > Use "alt" for the IMG, INPUT, and APPLET elements, or > > provide a text > > > > equivalent in the content of the OBJECT and APPLET elements. > > > > For complex content (e.g., a chart) where the "alt" text does > > > > not provide a > > > > complete text equivalent, provide an additional > > description using, for > > > > example, "longdesc" with IMG or FRAME, a link inside an > > > > OBJECT element, or a > > > > description link. > > > > For image maps, either use the "alt" attribute with AREA, or > > > > use the MAP > > > > element with A elements (and other text) as content. > > > > Refer also to checkpoint 9.1 and checkpoint 13.10. > > > > > > > > Techniques for checkpoint 1.1 > > > > 1.3 Until user agents can automatically read aloud the text > > > > equivalent of a > > > > visual track, provide an auditory description of the > > > > important information > > > > of the visual track of a multimedia presentation. [Priority 1] > > > > Synchronize the auditory description with the audio track as > > > > per checkpoint > > > > 1.4. Refer to checkpoint 1.1 for information about textual > > > > equivalents for > > > > visual information. > > > > Techniques for checkpoint 1.3 > > > > 1.4 For any time-based multimedia presentation (e.g., a movie > > > > or animation), > > > > synchronize equivalent alternatives (e.g., captions or > > > > auditory descriptions > > > > of the visual track) with the presentation. [Priority 1] > > > > > > > > > > > > I am trying to summarize what has been said to this point on > > > > this thread > > > > that responds to my earlier questions [1] > > > > > > > > SUMMARY > > > > > > > > 1. Does SMIL 2.0 support the development of synthesized > > > > speech auditory > > > > descriptions? > > > > > > > > SMIL does not currently support synthesized speech auditory > > > > descriptions. It > > > > does support prerecorded auditory descriptions. > > > > > > > > 2. If the answer to question #1 is "Yes", then briefly > > > > describe the support > > > > that is provided. > > > > > > > > N/A > > > > > > > > 3. If the answer to question #1 is "No", then please describe > > > > any plans for > > > > providing such support in the future. > > > > > > > > There are currently no plans for including this in SMIL. > > Aaron Cohen > > > > suggests that "Probably what is needed is a general > > > > accessible markup that > > > > can be used in SMIL, XHTML, SVG, etc. SMIL would just > > adopt this as a > > > > content type. This new content type could be designed to > > > > resuse a lot of > > > > SMIL content control, and it could have additional > > > > indirection mechanisms to > > > > enable the kind of structured grouping that you mention. But > > > > that's another > > > > spec, and for now the vendors are doing their own thing." [2] > > > > > > > > ==== > > > > > > > > COMMENT > > > > > > > > It seems to me that if SMIL 2.0 proceeds to Recommendation > > > > status, it would > > > > be good to have done several things. > > > > > > > > 1. Affirm W3C's commitment to suppporting Web accessbility, > > > > particularly the > > > > multimedia-related requirements of the Web Content > > > > Accessibility Guidelines > > > > (WCAG), User Agent Accessibility Guidelines (UAAG), Authoring Tool > > > > Accessibility Guidelines (ATAG). Captions, auditory > > descriptions, and > > > > collated text transcripts stand out in my mind in this > > > > regard. (See WCAG 1.0 > > > > [3]). > > > > > > > > 2. Explain why synthesized speech auditory descriptions are > > > > not or cannot be > > > > part of the SMIL 2.0 specification. > > > > > > > > 3. Suggest a plan for supporting synthesized speech auditory > > > > descriptions. I > > > > personally would like to see some kind commitment from the > > > > W3C to support > > > > this, either as part of the next version of SMIL or perhaps > > > > as Aaron has > > > > suggested, another specification that could be reused by > > > > SMIL, XHTML, SVG, > > > > etc. > > > > > > > > 4. Suggest techniques for providing such auditory > > > > descriptions and collated > > > > text transcripts until they are fully integrated into W3C > > > > specifications. > > > > > > > > I think that it would be appropriate to have at least a > > > > summary of such > > > > information as part of the Recommendation. I am concerned > > > > that without such > > > > information within the document, people may doubt the W3C's > > > > commitment to > > > > accessible media. > > > > > > > > > > > > > > > > [1] > > http://lists.w3.org/Archives/Public/www-smil/2000OctDec/0050.html > > > > [2] > > http://lists.w3.org/Archives/Public/www-smil/2000OctDec/0062.html > > > > [3] http://www.w3.org/TR/WAI-WEBCONTENT/ > > > > > > > > -----Original Message----- > > > > From: Cohen, Aaron M [mailto:aaron.m.cohen@intel.com] > > > > Sent: Monday, October 30, 2000 12:40 PM > > > > To: 'Brad Botkin' > > > > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric; > > www-smil@w3.org; thierry > > > > michel; www-smil-request@w3.org > > > > Subject: RE: Synthesized-speech auditory descriptions > > > > > > > > > > > > Brad: > > > > As far as the systemAudioDesc only taking on/off, that's > > > > true, but you can > > > > combine it with the other test attributes, such as > > > > systemLanguage, and get > > > > many, many combinations. Geoff Freed and the WAI people are > > > > reviewing those > > > > combinations for completeness, so if you think that we > > are missing a > > > > specific use case, please let us know. > > > > > > > > As far as separate text files for accessibility documents, > > > > you are right, > > > > that's a thorny issue for SMIL, which has left the definition > > > > of media (as > > > > opposed to the integration) to the player/content developers. > > > > > > > > Probably what is needed is a general accessible markup that > > > > can be used in > > > > SMIL, XHTML, SVG, etc. SMIL would just adopt this as a > > > > content type. This > > > > new content type could be designed to resuse a lot of SMIL > > > > content control, > > > > and it could have additional indirection mechanisms to enable > > > > the kind of > > > > structured grouping that you mention. But that's another > > > > spec, and for now > > > > the vendors are doing their own thing. > > > > > > > > -Aaron > > > > > > > > > -----Original Message----- > > > > > From: Brad Botkin [mailto:Brad_Botkin@wgbh.org] > > > > > Sent: Sunday, October 29, 2000 4:33 AM > > > > > To: Cohen, Aaron M > > > > > Cc: 'symm@w3.org'; geoff freed; Hansen, Eric; > > > > www-smil@w3.org; thierry > > > > > michel; www-smil-request@w3.org > > > > > Subject: Re: Synthesized-speech auditory descriptions > > > > > > > > > > > > > > > Aaron, > > > > > What seems to be missing from > > > > > > > > > > > > <par> > > > > > > > <audio src="snippet8043.wav"> > > > > > > > <description xml:lang="en"> > > > > > > > The lady in the pink sweater > > > > > picks up the pearl > > > > > > > necklace from the table and walks to the door. > > > > > > > <description/> > > > > > > > <description xml:lang="fr"> > > > > > > > Oui. > > > > > > > <description/> > > > > > > > </audio> > > > > > > > /par> > > > > > > > > > > is a way to uniquely and unambiguously identify the text > > > > above as the > > > > > audio description (unless the <description> tag is just > > that, but I > > > > > assume "<description xml....>" here is a generic term > > unrelated to > > > > > "audio description" as we're talking about it). > > > > > > > > > > The <systemAudioDesc> tag is a way to signal a player that some > > > > > particular content should be played for some users. But > > > > the specific > > > > > rendering device has the job of deciding which media > > > > element to play, > > > > > the audio (uniquely identified by the "src" attribute) or the > > > > > transcription of that element (not yet uniquely identified). > > > > > > > > > > The point is that there may be more than just one text string > > > > > associated > > > > > with an audio element, only one of which is the > > > > transcription of that > > > > > audio. <systemAudioDesc> *almost* spoke to this need, > > > > except that it > > > > > only takes an "on/off" value, which seems insufficient to > > > > the task of > > > > > allowing rendering engines to adequately handle > > > > accessibility issues. > > > > > Since accessibility is being legislated in the tv and > > > > multimedia arena > > > > > as we speak, it seems prudent to create a set of extensible > > > > > accessibility tags which will allow those industries to > > > > easily utilize > > > > > SMIL in their workflow. It's true that these elements > > would not be > > > > > general, reuseable ones, and I sympathize with your reticence to > > > > > generate more case markup. Nonetheless.... > > > > > > > > > > In another vein, how about the issue of how to manage the > > > > grouping of > > > > > synched accessibility objects (captions and descriptions, > > > > for example) > > > > > in eparate text files. I'm sure this is thorny, but > > the current > > > > > existing formats (RealText, SAMI, Quicktime qtText) all > > > > offer a way to > > > > > group these related elements (for captioning). Current > > thoughts? > > > > > > > > > > --Brad > > > > > \_\_\_\_\_\_\_\_\_\_\_ > > > > > Brad_Botkin@wgbh.org Director, Technology & Systems > > Development > > > > > (v/f) 617.300.3902 NCAM/WGBH - National Center for > > > > > 125 Western Ave Boston MA 02134 Accessible Media > > > > > \_\_\_\_\_\_\_\_\_\_\_ > > > > > > > > > > > > > > > > > > > > "Cohen, Aaron M" wrote: > > > > > > > > > > > > Brad: > > > > > > That specific use of verbatim text is what systemAudioDesc > > > > > is for. It can be > > > > > > used on text media elements that can contain the verbatim > > > > > text. The pair of > > > > > > audio and text elements can be wrapped in a par and given a > > > > > specific title, > > > > > > and the unit used in a presentation just like an individual > > > > > media element. > > > > > > > > > > > > Why would it be better to have special case markup when the > > > > > generalized > > > > > > capabilities that we have cover the use cases? > > > > > > > > > > > > Your example confuses me, since it doesn't seem to give any > > > > > more capability > > > > > > than we already have with XHTML+SMIL: > > > > > > > > > > > > <par> > > > > > > <audio src="snippet8043.wav"/> > > > > > > <p systemAudioDesc="on">The lady in the pink > > > > > sweater picks up the > > > > > > pearl necklace from the table and walks to the door.</p> > > > > > > </par> > > > > > > > > > > > > Even less, since you can't hang an xml:lang off the > > > > > attribute, necessitating > > > > > > duplication of the media object reference for each langauge > > > > > of the text > > > > > > description. > > > > > > > > > > > > With SMIL 2.0, you have to put the text in alt or another > > > > > file, because SMIL > > > > > > does not itself define media: > > > > > > <par> > > > > > > <audio src="snippet8043.wav"/> > > > > > > <text systemAudioDesc="on" src="lady.txt/> > > > > > > /par> > > > > > > > > > > > > If you are saying that there should be some general > > > > > scalable mechanism to > > > > > > make this easier to maintain, I agree with you, with the > > > > additional > > > > > > stipulation that this is not just a smil issue, but an > > > > > issue for all XML > > > > > > languages that have non-text content. > > > > > > > > > > > > For the next version of SMIL, we plan to adopt SVG's > > > > > description element, > > > > > > which would allow you to do something like this in SMIL: > > > > > > > > > > > > <par> > > > > > > <audio src="snippet8043.wav"> > > > > > > <description xml:lang="en"> > > > > > > The lady in the pink sweater picks > > > > > up the pearl > > > > > > necklace from the table and walks to the door. > > > > > > <description/> > > > > > > <description xml:lang="fr"> > > > > > > Oui. > > > > > > <description/> > > > > > > </audio> > > > > > > /par> > > > > > > > > > > > > Having an attribute on elements that are specially meant to > > > > > be a literal > > > > > > text translation of (possibly long) media does not scale > > > > > well. The sub > > > > > > elements make more sense. > > > > > > > > > > > > I think that this is the beginning of discussion about the > > > > > need to create a > > > > > > set of reusable markup elements that fit the indentified > > > > > needs. I can > > > > > > imagine <description>, <transcription>, and <title> child > > > > > elements, all > > > > > > enclosing text. > > > > > > > > > > > > My point is that these are real problems that need > > > > > solutions, but the > > > > > > solutions need to be general, reusable and thought out in > > > > > detail. This will > > > > > > require some dedicated people and some time. This is way > > > > > too late in the > > > > > > SMIL 2.0 process to start integrating this kind of thing > > > > > into the language, > > > > > > but it is something that should be done for re-use by > > everyone and > > > > > > integrated into SMIL (and XHTML 2.0?, SVG?) in the future. > > > > > > > > > > > > -Aaron > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Brad Botkin [mailto:brad_botkin@wgbh.org] > > > > > > > Sent: Friday, October 27, 2000 12:30 PM > > > > > > > To: Cohen, Aaron M > > > > > > > Cc: geoff freed; Hansen, Eric; www-smil@w3.org; > > thierry michel; > > > > > > > www-smil-request@w3.org > > > > > > > Subject: Re: Synthesized-speech auditory descriptions > > > > > > > > > > > > > > > > > > > > > Aaron, > > > > > > > I think the actual transcription of the audio deserves > > > > > its own tag, > > > > > > > since it's so specific. For the same reason that > > you created a > > > > > > > systemAudioDesc tag and didn't just use the alt tag. You > > > > > need a place > > > > > > > to look that's consistent. I believe the longdesc is > > > > > intented to be > > > > > > > used as simply a longer text description of the > > > > > unnderlying graphic or > > > > > > > media file. And in the case of audio description snippets, > > > > > > > the longdesc > > > > > > > could be used to hold timing or other metadata related to > > > > > the snippet > > > > > > > but not specifically voiced. I think that verbatim text > > > > will prove > > > > > > > invaluable in the future, for searching, etc., and you > > > > > should consider > > > > > > > creating a specific tag for this. > > > > > > > --Brad > > > > > > > \_\_\_\_\_\_\_\_\_\_ > > > > > > > Brad_Botkin@wgbh.org Director, Technology & Systems > > > > Development > > > > > > > 617.300.3902 (v/f) NCAM/WGBH - > > National Center for > > > > > > > 125 Western Ave Boston MA 02134 > > Accessible Media > > > > > > > \_\_\_\_\_\_\_\_\_\_ > > > > > > > > > > > > > > > > > > > > > "Cohen, Aaron M" wrote: > > > > > > > > > > > > > > > > Brad: > > > > > > > > > > > > > > > > We also have alt and longdesc, either of which could be > > > > > > > used by a player to > > > > > > > > provide accessory or alternative text content. > > These can be > > > > > > > combined with > > > > > > > > the systemLanguage and other test attributes to provide > > > > > > > many combinations of > > > > > > > > accessiblity and internationalization. > > > > > > > > -Aaron > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Brad Botkin [mailto:brad_botkin@wgbh.org] > > > > > > > > > Sent: Friday, October 27, 2000 5:41 AM > > > > > > > > > To: geoff freed > > > > > > > > > Cc: Hansen, Eric; www-smil@w3.org; thierry michel; > > > > > > > > > www-smil-request@w3.org > > > > > > > > > Subject: Re: Synthesized-speech auditory descriptions > > > > > > > > > > > > > > > > > > > > > > > > > > > Geoff, > > > > > > > > > True but incomplete. It sounds like Eric is asking > > > > for a tag > > > > > > > > > which identifies text as a transcription of the > > underlying > > > > > > > > > audio. Something like: > > > > > > > > > > > > > > > > > > <par> > > > > > > > > > ..... > > > > > > > > > <audio systemAudioDesc="on" > > > > > > > > > AudioDescText="The lady in the pink > > > > > > > > > sweater picks up the pearl necklace from the table and > > > > > > > walks to the > > > > > > > > > door." > > > > > > > > > src="snippet8043.wav"/> > > > > > > > > > ..... > > > > > > > > > </par> > > > > > > > > > > > > > > > > > > It's a great idea, since the text is > > super-thin, making it > > > > > > > > > appropriate for transmission in narrow pipes with local > > > > > > > > > text-to-speech synthesis for playback. Note > > that the volume > > > > > > > > > of snippets in a longer piece, like a movie, is > > huge, just > > > > > > > > > like closed captions. Inclusion of 1000 audio > > description > > > > > > > > > snippets and 2000 closed captions, each in 3 > > languages, each > > > > > > > > > with its own timecode, all in the same SMIL > > file will make > > > > > > > > > for some *very* unfriendly files. Better would be > > > > > to provide a > > > > > > > > > mechanism which allows the SMIL file to > > gracefully point to > > > > > > > > > separate files each containing the timecoded AD > > > > snippets (with > > > > > > > > > transcriptions per the above) and timecoded > > captions. It > > > > > > > > > requires the SMIL player to gracefully overlay > > the external > > > > > > > > > timeline onto the intrinsic timeline of the SMIL file. > > > > > > > > > Without this, SMIL won't be used for interchange of > > > > > caption and > > > > > > > > > description data for anything longer than a minute > > > > or two. A > > > > > > > > > translation house shouldn't have to unwind a > > bazillion audio > > > > > > > > > descriptions and captions in umpteen other languages to > > > > > > > > > insert its French translation. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > --Brad > > > > > > > > > \_\_\_\_\_\_\_\_\_\_\_ > > > > > > > > > Brad_Botkin@wgbh.org Director, Technology & Systems > > > > > Development > > > > > > > > > (v/f) 617.300.3902 NCAM/WGBH - National > > > > > Center for > > > > > > > > > 125 Western Ave Boston MA 02134 > > > > Accessible Media > > > > > > > > > \_\_\_\_\_\_\_\_\_\_\_ > > > > > > > > > > > > > > > > > > > > > > > > > > > geoff freed wrote: > > > > > > > > > > > > > > > > > > > Hi, Eric: > > > > > > > > > > > > > > > > > > > > SMIL 2.0 provides support for audio descriptions > > > > via a test > > > > > > > > > attribute, systemAudioDesc. The author can record audio > > > > > > > > > > descriptions digitally and synchronize them > > into a SMIL > > > > > > > > > presentation using this attribute, similar to how > > > > captions are > > > > > > > > > > synchronized into SMIl presentations using > > systemCaptions > > > > > > > > > (or system-captions, as it is called in SMIL 1.0). > > > > > > > > > > > > > > > > > > > > Additionally, using SMIL2.0's <excl> and > > <priorityClass> > > > > > > > > > elements, the the author may pause a video track > > > > > > > > > > automatically, play an extended audio > > description and, > > > > > > > > > when the description is finished, resume > > playing the video > > > > > > > > > > track. This will be a boon for situations where the > > > > > > > > > natural pauses in the program audio aren't sufficient > > > > > for audio > > > > > > > > > > descriptions. > > > > > > > > > > > > > > > > > > > > Geoff Freed > > > > > > > > > > CPB/WGBH National Center for Accessible Media (NCAM) > > > > > > > > > > WGBH Educational Foundation > > > > > > > > > > geoff_freed@wgbh.org > > > > > > > > > > > > > > > > > > > > On Wednesday, October 25, 2000, thierry michel > > > > > > > > > <tmichel@w3.org> wrote: > > > > > > > > > > > > > > > > > > > > > >> My questions concern the use of SMIL for developing > > > > > > > > > auditory descriptions > > > > > > > > > > >> for multimedia presentations. > > > > > > > > > > >> > > > > > > > > > > >> The Web Content Accessibility Guidelines > > (WCAG) version > > > > > > > > > 1.0 of W3C/WAI > > > > > > > > > > >> indicates the possibility of using speech > > synthesis for > > > > > > > > > providing auditory > > > > > > > > > > >> descriptions for multimedia presentations. > > > > Specifically, > > > > > > > > > checkpoint 1.3 of > > > > > > > > > > >> WCAG 1.0 reads: > > > > > > > > > > >> > > > > > > > > > > >> "1.3 Until user agents can automatically > > read aloud the > > > > > > > > > text equivalent of > > > > > > > > > > >a > > > > > > > > > > >> visual track, provide an auditory > > description of the > > > > > > > > > important information > > > > > > > > > > >> of the visual track of a multimedia presentation. > > > > > > > [Priority 1] > > > > > > > > > > >> Synchronize the auditory description with the audio > > > > > > > track as per > > > > > > > > > > >checkpoint > > > > > > > > > > >> 1.4. Refer to checkpoint 1.1 for information about > > > > > > > > > textual equivalents for > > > > > > > > > > >> visual information." (WCAG 1.0, checkpoint 1.3). > > > > > > > > > > >> > > > > > > > > > > >> In the same document in the definition of > > > > > > > "Equivalent", we read: > > > > > > > > > > >> > > > > > > > > > > >> "One example of a non-text equivalent is > > an auditory > > > > > > > > > description of the > > > > > > > > > > >key > > > > > > > > > > >> visual elements of a presentation. The > > description is > > > > > > > > > either a prerecorded > > > > > > > > > > >> human voice or a synthesized voice (recorded or > > > > > > > > > generated on the fly). The > > > > > > > > > > >> auditory description is synchronized with the audio > > > > > > > track of the > > > > > > > > > > >> presentation, usually during natural pauses in > > > > the audio > > > > > > > > > track. Auditory > > > > > > > > > > >> descriptions include information about > > actions, body > > > > > > > > > language, graphics, > > > > > > > > > > >and > > > > > > > > > > >> scene changes." > > > > > > > > > > >> > > > > > > > > > > >> My questions are as follows: > > > > > > > > > > >> > > > > > > > > > > >> 1. Does SMIL 2.0 support the development > > of synthesized > > > > > > > > > speech auditory > > > > > > > > > > >> descriptions? > > > > > > > > > > >> > > > > > > > > > > >> 2. If the answer to question #1 is "Yes", > > then briefly > > > > > > > > > describe the > > > > > > > > > > >support > > > > > > > > > > >> that is provided. > > > > > > > > > > >> > > > > > > > > > > >> 3. If the answer to question #1 is "No", > > then please > > > > > > > > > describe any plans > > > > > > > > > > >for > > > > > > > > > > >> providing such support in the future. > > > > > > > > > > >> > > > > > > > > > > >> Thanks very much for your consideration. > > > > > > > > > > >> > > > > > > > > > > >> - Eric G. Hansen > > > > > > > > > > >> Development Scientist > > > > > > > > > > >> Educational Testing Service (ETS) > > > > > > > > > > >> Princeton, NJ 08541 > > > > > > > > > > >> ehansen@ets.org > > > > > > > > > > >> Co-Editor, W3C/WAI User Agent > > Accessibility Guidelines > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > s
Received on Tuesday, 7 November 2000 15:13:52 UTC