Re: CSS Speech: Updated StyleSheet Specification

Raman T. V. writes:

 > Here is a revised version of the cascaded speech stylesheet based
 > on feedback from the net community.

As far as I can tell, the choice of properties is OK. I still have
some questions about the possible values. I definitely want this in
CSS sometime. (CSS2 probably, since CSS1 is fixed to the simple
language as it is now.)

 >   Thus, when rendering a well-written document that uses the emphasis tag to  mark emphasized
 > phrases, such an aural browser would use the speech properties specified for
 > emphasis in the speech stylesheet.
 > However, if a document uses layout specific tags such as <IT> 
Italic is <I>.

 > <H3>Speech Properties</H3>
 > Speech properties specify the voice characterestic to be used when rednering
 > specific document elements. 

Is there a reason why you put `:' in front of all properties and some

 > <DL>
 > <DT> :volume
 > <DD> level  [0 1 2 3 4 5 6 7 8 9 10] or   (nnndb)  (specified in decibels )
 >      or [soft |medium | loud ]
 >      The volume of the speaker. Specified  as a numeric level, in decibels or
 >      using the keywords soft, medium or loud.
 >      The volume if specified as a level is mapped by the implementation of the
 >      UA to an appropriate device setting with a setting of 5 interpreted as "medium".

For font-size we allowed relative values as well as absolute ones. We
could do the same here: -1 means one step softer, +1 (or just 1) means
one step louder.

Is it enough to have just three keyword levels? Analogously to
font-size, we could add x-soft, xx-soft, x-loud, xx-loud.

If numbers are going to be interpreted relatively, there should also
be a `silent' keyword.

 > <DT> :voice-family
 > <DD> string<P>
 >      Analogous to the :font-family property.
 >      This specifies the kind of voice to be used, and can be something generic
 >      such as <em>male</em>  or something more specific such as
 >      <em>comedian</em>
 >      or something very specific such as <em>paul</em>.
 >      We recommend the same approach as used in the case of :font-family --the
 >      style sheet provide a list of possible values ranging from most to least
 >      specific and allow the browser to pick the most specific voice that it
 >      can find on the output device in use.

Are there well-known names (like `paul') that are more or less agreed
on? Is it possible to specify a URL for the voice-family instead, so
that the URL describes the voice?

 > <DT> :speech-rate
 > <DD>Level [ -- 10] or  Number (NNNwpm)   (wordsper minute)
 >       or [slow | medium | fast]<P>
 >      Specifies the speaking rate. 
 >      If specified as a level, 5 is interpreted as medium.

Same comment as for volume: maybe we can interpret the numbers
relative to the inherited values, and add a few more keywords:
xx-slow, x-slow, etc.

 > <DT> :average-pitch

In this case relative values probably make less sense.

 > <DT>  :stress 
 > <DD> number (0--100)<P>
 >      Specifies the level of stress (assertiveness or emphasis) of the speaking
 >      voice.  English is a <strong>stressed</strong> language, and different
 >      parts of a sentence are assigned primary, secondary or tertiary
 >      stress. The value of property :stress controls amount of inflection that
 >      results from these stress markers.  Different speech devices may require
 >      the setting of one or more device-specific parameters to achieve this
 >      effect.  <P>

Does this refer to the `speech-other' property further down? Or is it
just a reminder to implementers that common software and hardware may
not have a single `knob' that correspond exactly to this property? (I
would expect that they already know that.)

 >      <DT> :pause-around

A trick we used for `margin' was to have a single property as a
shorthand for four others. You could do the same here:

- `pause' with one value means the same pause before and after
- `pause' with two values means the 1st value before and the 2nd after.

Adding an explicit unit (ms) to the number is more elegant, I think,
and leaves room for adding other values later.

 > <DT> :pronunciation-mode
 > <DD> string<P>

I guess it's a keyword, rather than a string (i.e., no quotes). I know
it is difficult to define a list of values without further study, but
maybe it would be good to include a very short initial list.

A tree-like fallback structure for all values is desirable:

  speak-military-time ____
  speak-am-pm ____________|___ speak-time __
  speak-all-punctuation ____________________|___ speak-default
  speak-some-punctiation ___________________|

 >      <LI>  Speak only some punctuations.
 >           In this case, the rule for handling punctuation marks is 
 >           specified  by providing  a value for property
 >           :punctuation-marks-to-skip or :punctuation-marks-to-speak.

Having only one of the two would be simpler to handle. Which one is
preferred? The value would be one or more strings. Strings are
required rather than keywords, since they must be internationalized.

 > <DT> :language

In the HTML Internationalization draft
language, country and dialect were combined (en-us, en-cockney, etc.).

  Bert Bos                                ( W 3 C ) http://www.w3.org/
  bert@w3.org                                  INRIA project RODEO/W3C
  http://www.w3.org/pub/WWW/People/Bos/   2004 Rt des Lucioles / BP 93
  +33 93 65 77 71                 06902 Sophia Antipolis Cedex, France

Follow-Ups: References: