[CSS21] WAI Issue 1: Relegation of Aural CSS to an informative appendix & the Deprecation of the aural media type [DRAFT]

Gregory, Here is some "historical" perspective on the
speech/aural split  -- this is mostly from memory.

Sometime in the 2003 timeframe, Dave Raggett and I were looking
to synchronize SSML and Aural CSS in the following sense:

Rendering rules expressed via Aural CSS when applied to XML
markup should be able to produce SSML that delivers the desired
aural presentation.

In going through that exercise, we hit a number of
discrepancies, most of which came down to "SSML is mostly about
speech" whereas Aural CSS  dealt with much more than speech.

Also, given the lack of implementation of Aural CSS within
browsers, and given that to an extent Aural CSS had been
dismissed by mainstream browsers as "that's for speech output, we
dont do that",
we felt that it was worthwhile splitting Aural CSS into two
modules, speech and aural, where @media speech sould be aligned
fully with SSML.

To what extent the current drafts reflect that desire is
something I've not had the time to check.

Gregory J. Rosmaita writes:
 > 
 > [Reviewer's Note: this post refers to the Candidate Recommendation draft 
 > of CSS 2.1,
 > http://www.w3.org/TR/2007/CR-CSS21-20070719
 > comments upon which are due by 20 December 2007]
 > 
 > Given the following use case:
 > 
 > Aural rendering is used to provide supplemental contextual and semantic 
 > markers for an individual with either limited vision, or a limited 
 > view-port, such as that obtained by using a screen-magnifier application, 
 > which displays strings of text in isolated viewports, with earcons 
 > (purely 
 > aural cues) set to "on", but without speech output.  Such a user uses 
 > aural cues, provided by such extant mechanisms as:
 > 
 > http://www.w3.org/TR/CSS21/aural.html#cue-props
 > http://www.w3.org/TR/CSS21/aural.html#mixing-props
 > http://www.w3.org/TR/CSS21/aural.html#spatial-props
 > 
 > to supplement that user's constrained point of view.  Note that this use 
 > case includes those who fall under the purview of such organizations as 
 > Recording for the Blind and Dyslexic (http://www.rfbd.org)
 > 
 > Note that some users will benefit from viewing portions of the screen 
 > using a screen-magnifier and aural cues; but that there are also those 
 > who not only need isolated portions of the visual canvas rendered for 
 > them, 
 > but whose understanding and ability to interact with the document 
 > benefits 
 > greatly from supplemental synthesized speech;
 > 
 > How, then, can speech be seperated from audio?  The Style WG should be 
 > wary of the seperation of speech and pure aural rendering rules, as 
 > there is one modality being addressed: the aural canvas, whether that 
 > includes speech-synthesis or purely earconic sounds.
 > 
 > The question, therefore, is this:  What is the point of changing the 
 > media type from aural to speech?  Speech synthesizers are aural 
 > renderers, 
 > but they rely on a third party application (optimally, a DOM-aware user 
 > agent) in order to obtain the content, flow, etc. of the speech-output.  
 > If a user agent supports speech, as does FireVox, it also needs to 
 > support 
 > the purely aural (earconic) portions of the media rule; speech 
 > synthesizers are not user agents, they are more akin to browser helper 
 > objects (BHO) than they are to user agents per se.
 > 
 > 
 > SUMMATION:
 > 
 > The deprecation of the aural media type in favor of the speech 
 > media type, is unacceptable, as there are valid use cases where an 
 > individual benefits from supplemental earcons that sound while 
 > viewing the visual canvas through a screen-magnifier type view-port, 
 > without speech output, but with support for a pure audio 
 > (non-speech) overlay; likewise, there is the use case of an 
 > individual who benefits from supplemental speech, as well as a 
 > limited viewport and aural orientational and contextual cues.  
 > 
 > Why is it necessary for Aural CSS2.1 to remain normative?  The 
 > aural cascade will enable an author to offer visitors is a choice 
 > between "verbose" "terse" and "earconic" overlays. SSML may be 
 > where the money and resources are currently devoted, but Aural CSS 
 > is far superior for speech-output dependent computer users (that 
 > is, the average end user) because things aren't hard coded, but 
 > are subject to user over-rides. It's obviously a lot easier to 
 > wizardize a "modify this site's aural styling", which would allow 
 > the end user the final say over what is spoken and how, than to 
 > edit an SSML document's document source.
 > 
 > An added benefit of retaining the purely aural portions of ACSS 
 > is that, if both speech and purely aural styling are addressed 
 > in the same stylesheet, it reduces the burden on the author, 
 > allows for end-user override, and it increases the probability 
 > of the implementation of both forms of painting to the aural 
 > canvas.
 > 
 > 
 > PROPOSED RESOLUTION:
 > 
 > 1. The PF WG requests that the editors and Working Group de-deprecate the 
 >    "aural" media type and deprecate the "speech" media type
 > 
 > 2. The PF WG requests that Appendix A be renamed to Chapter/Section 19 
 > and 
 >    made normative
 > 
 > 
 > 
 > ----------------------------------------------------------------
 > CONSERVATIVE, n.  A statesman who is enamored of existing evils,
 > as distinguished from the Liberal, who wishes to replace them 
 > with others.         -- Ambrose Bierce, _The Devil's Dictionary_
 > ----------------------------------------------------------------
 >              Gregory J. Rosmaita, oedipus@hicom.net
 >   Camera Obscura: http://www.hicom.net/~oedipus/index.html
 > ----------------------------------------------------------------
 > 

-- 
Best Regards,
--raman

Title:  Research Scientist      
Email:  raman@google.com
WWW:    http://emacspeak.sf.net/raman/
Google: tv+raman 
GTalk:  raman@google.com, tv.raman.tv@gmail.com
PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc

Received on Tuesday, 11 December 2007 17:37:02 UTC