Re: Comments from PFWG on CSS3 Speech Module

I am myself a sighted user, but I am particularly sensitive to the needs of screen reader users, as I work for an organisation that serves the blind, visually-impaired and otherwise print-disabled,  However, the CSS Speech Module caters for a variety of use-cases, and provides authors with means to define any particular speech synthesis experience (note: whilst conforming to the CSS "cascade" rules which allow users to override the author's intent, obviously). There are children's talking books that not only slowly speak the words, but also intentionally pause in between words to allow some extra reading time. This not only applies to special-needs users, let's keep in mind the wide spectrum of user abilities!

CSS 3 Media Queries currently relies on a limited set of media features (mostly hardware), including the rather broad 'speech' media type (formerly 'aural'). Hardware features are easily broken-down into a small set of well-identified functions, but the combinatory nature of subtle human features means that the user context currently definable by CSS Media Types is minimal (and perhaps sub-optimal in some cases). Just look at all the possible types of color blindness, the various needs for screen magnification, the degrees of reading skills across the autism spectrum, the specificities of dyslexia, etc. Although I like the idea of defining possible particular speech contexts, I think it corresponds to a whole new body of work that should be tackled collaboratively at a later stage, with the Media Queries Working Group and others (min/max-words-per-minute seems like an obvious starting point).

I am reluctant to venture in attempting to define "acceptable" time ranges for the 'rest' and 'pause' properties. Once again, authors are able to produce some really silly content, such as when displaying pointless and distracting visual animations, or for example when using needlessly-long auditory cues. Defining good or bad authoring practices is out of scope in this specification, and is the subject of a separate activity, perhaps under the WAI-CAG umbrella, or maybe kick-started via a W3C Community Group?

The keywords representing the strength of the prosodic break in the speech output are modelled after the SSML 'break' feature. The actual corresponding length of time depends on the underlying speech processor, and may even vary from one voice to another (from a linguistic perspective, boundaries in a given text prose can be accompanied by varying pause durations). Although the speech output from one TTS engine to another is not necessarily consistent, I expect the words-per-minute rates and default prosodic breaks to be relatively similar across platforms (i.e. not "drastically different").

As discussed before, I agree that CSS-Speech values could be added to the informative definition of the HTML(4/5) default user-agent stylesheet. As you rightly pointed-out, this is out-of-scope. Markup agnosticism doesn't mean that we can't add an extra appendix in CSS-Speech, but I would rather see this work completed collaboratively with other groups, such as the HTML5 folks. Ultimately, such document can be incorporated back into the CSS Speech Module (e.g. future revision, Level 4+).

As for your last point: authors may indeed have different intents depending on the listening / speech context they target (see our Media Queries discussion above). Using CSS pseudo-class selectors, authors (and overridden styles specified by user-agents / users) are able to control pauses and auditory cues specifically for when an element is focused or activated. I am therefore not convinced that we need to "unequivocally declare" a specific behavior for the "screen reader context". This is just standard CSS authoring practice.

Thank you very much for all your insightful comments!
I hope this response is satisfactory, please let us know.
Kind regards, Daniel

On 11 Oct 2011, at 20:25, Janina Sajka wrote:
> 4.)	pause-before:/pause-after:
> These properties are of concern because they represent another way for the page author to hijack a screen reader user's experience. We are also concerned that end users will interpret correct implementation of these properties as a severe performance lag. For example, if a user were forced to wait 2 seconds between each heading, the experience would be tedious for TTS users comfortable with machine speech at rates pushing 400 words per minute.
> If you plan to keep this property, we suggest the following:
> 1. Consider defining a few variants of the @media values defining the particular speech context. A long pause may provide slightly more value for the "save to audio file" or "read all" context than it would to a general screen reader user in the process of navigating a document quickly. We think it's unlikely that many screen reader users would want this feature affecting their TTS speed and responsiveness.
> 2. Define a maximum range for pause-before <time>, preferably less than 2s for screen readers, and issue validation warnings for times over the maximum.
> 3. Define millisecond values or WPM-relative time values for tokens, preferably all less than 1s. The document states that this it implementation-dependent. W3C history has shown this will result in drastically different values, and inconsistent implementation will be frustrating for authors and users alike.
> 4. In a separate document (perhaps HTML5) define default mappings of elements to their expected pause values. e.g. A table mapping pause before/after columns with each HTML element as a row.
> 5. Unequivocally declare that implementors should ignore pause-before values when navigating to an element in the screen reader context, so as to not create the perception of performance lag. e.g., If a screen reader user presses the command to "jump to next heading," speak it immediately. Ignore pause-before immediately after a focus change.

Received on Monday, 17 October 2011 09:02:35 UTC