Re: T.E.O.'s Draft--Cascading Speech Style Sheets (txt) from lilley on 1996-02-28 (www-style@w3.org from February 1996)

From: lilley <lilley@afs.mcc.ac.uk>
Date: Wed, 28 Feb 1996 20:13:01 +0000 (GMT)
To: JuanJo.Miguez@esat.kuleuven.ac.be (JuanJo Miguez)
Cc: www-style@w3.org
Message-Id: <6734.9602282013@afs.mcc.ac.uk>
JuanJo Miguez writes:

> [...] very interesting to include Speech in the CSS but we don't want 
> to make it very complicated. Many people doesn't even know decibels,

I agree that decibels are not an optimal means to specify volume

> most  actual speech synthesizers are mono 

I would love to see some justification for this statement. My experience is
precisely the opposite: most audio output devices are at least stereo. Panning,
say, an H1 to the far right and a P to the center is trivially simple.


>     Volume
>     	Value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
> 	Initial: 0
> 	Applies to: All elements
> 	Example: volume: 5
> 
> 	The reason why the default value is 0 is because normally there 
> 	will not be sound, but in the case that other value is specified 
>         the speech syntetizer will start working.

This seems strongly counter-intuitive. The default is that there is no sound?
Perhaps a stylesheet for visual presentation could specify that the default 
is black text on a black background, so the screen is entirely dark?

I think it would be very difficult to provide both gross volume adjustment 
(to suit individual preference and to cope with differing projection 
requirements) and also provide any subtlety of volume change between 
elements with such a restricted range.

>     Speed
>         Value: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 | 10 |
>         Initial: UA specific
>         Applies to: All elements
>         Example: speed: 6
> 
> 	Some users (specially between blind people) prefers very high 
> 	speed speech because they have a very good hearing so they could 
> 	go very fast reading web pages. That is the reason why we chose this 
>         big range. Of course "speed: 0" is not allowed because you could 
> 	not hear anything. 

I agree that overall speed should be adjustable, because different
people listen at different speeds, just as they read at different
speeds.  Given a suitable overall speed setting, I would imagine that
there would be a fairly narrow range of adjustability to apply different
style to different elements of the document, while still retaining
intelligibility.
	
>     Voice-type
>         Value: | child1 | child2 | male1 | male2 | female1 | female2 |
>         Initial: UA specific
>         Applies to: All elements
>         Example: voice-type: female1
> 
> 	This is the way to set the phisical features of the articulating 
> 	voice. For example the voice of a boy, a woman, a man, sounds 
> 	different, and that is the reason.

This seems arbitrarily limited. If a synthesiser can produce different voices,
why should it be limited to only two different adult voices of each sex?
	
>     Pitch
>         Value: | 1 | 2 | 3 | 4 | 5 | 6 |
>         Initial: UA specific
>         Applies to: All elements
>         Example: pitch: 4
>     
>         This is a small range for the medium frequency (F0). The same 
> 	person (the same voice type) can talk (in media) more grave or 
> 	less, 

Why is the range of values asymetric (3 below the median, but two above) 
and why so little resolution?


>     Prosidy
>         Value: | on | off |
>         Initial: on
>         Applies to: All elements
>         Example: prosidy: off
> 	
> 	With prosidy activated the synthesizer gives the entonation (the 
> 	evolution of F0 along the time) which will sound hard, soft, angry
> 	questionable..... If you have "prosidy:off" the result will be 
> 	like the voice of a robot 

This seems to use up one extra property just to give an additional voice type.
It could equally well be expressed as a single extra value on the proposed 
voice-type property "robot".

> (blind people prefer this kind of voice 

Indeed. I will leave it to any blind subscribers to the group to confirm 
or refute this assertion.

>     Language
>         Value: defined in the ISO 639 (Codes for the representation of 
> 	the names of languages)
>         Initial: en
>         Applies to: All elements
>         Example: language: fr
> 
> 	You can specify any language because the way to pronounce the same 
> 	message is different between countries (e.g. fr,nl,es,en....).

I agree that this property is very important, although one would hope that
the Accept-langauge property of HTTP would be used to establish the language 
of the majority of the document and that the HTML LANG attribute would be used
to identify elements which were in other languages.

It is not currently possible in CSS1 to use any attributes other than 
CLASS and ID. I suggest that adding the LANG attribute is a requirement
for aural presentation.

>       The 
> 	default value is english because it's the most used language in 
> 	the web,

At present, largely because internationalisation is only now being addressed.
I suggest that a better initial value would be UA specific.


> 	We try to make understandable speech, but we think that it's
> 	difficult to make a speech synthesizer speaking in all the dialects
> 	of all the world's countries, as Raman suggests in his draft. It
> 	could be possible,

It could indeed be possible, but your style sheet proposal would not allow 
such capabilities to be used in a style sheet. It is far prefeable to allow 
such things to be specified. Fore example, I have come accross synthesisers 
that could swithc between three named American regional accents.

> but not many people could afford it. 

Many people cannot aford good colour screens, either. That does not mean 
that visual style sheets should be restricted to greyscale presentation.

You can always discard information, but you cannot get it back if it 
wasn't there to start with.

Thanks for your comments - there wqere some good points made there, though 
on the whole I prefer Raman's draft to this one as it seems much more 
expressive.

-- 
Chris Lilley, Technical Author and JISC representative to W3C 
+-------------------------------------------------------------------+
|  Manchester and North Training & Education Centre   ( MAN T&EC )  |
+-------------------------------------------------------------------+
| Computer Graphics Unit,             Email: Chris.Lilley@mcc.ac.uk |
| Manchester Computing Centre,        Voice: +44 161 275 6045       |
| Oxford Road, Manchester, UK.          Fax: +44 161 275 6040       |
| M13 9PL                            BioMOO: ChrisL                 |
| Timezone: UTC        URI: http://info.mcc.ac.uk/CGU/staff/lilley/ | 
+-------------------------------------------------------------------+
Received on Wednesday, 28 February 1996 15:13:37 UTC