T.E.O.'s Draft--Cascading Speech Style Sheets

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 plus SQ/ICADD Tables//EN" 
"html.dtd">
<HTML><HEAD><TITLE>T.E.O.'s Draft--Cascading Speech Style 
Sheets</TITLE></HEAD>
<BODY><center><H1>T.E.O.'s Draft--Cascading Speech Style Sheets</H1>
<H3>K.U. Leuven</H3></center> 
<UL>
<LI>Ing. to be Juan Jose Miguez Iglesias
<A 
HREF="mailto:Juanjo.Miguez@kuleuven.ac.be">Juanjo.Miguez@KULeuven.ac.be</A>
<LI>in. Filip Evenepoel
<A 
HREF="mailto:Filip.Evenepoel@kuleuven.ac.be">Filip.Evenepoel@KULeuven.ac.be</A>
<LI>in. Bart BAwens
<A HREF="mailto:Bart.Bauwens@kuleuven.ac.be">Bart.Bauwens@KULeuven.ac.be</A>
<LI>Prof.dr.in Jan Engelen
<A HREF="mailto:Jan.Engelen@kuleuven.ac.be">Jan.Engelen@KULeuven.ac.be</A>
<LI>Prof.ing Antonio S. Pena from the E.T.S.I.Telecomunication of Vigo 
(Spain)</UL>
<HR><H2>A simple definition</H2>
<P>The T.E.O. group at the Katholique University of Leuven in Belgium 
believe that
the best way to include Speech within the CSS is to make it simple and 
general, so that
it's easy to use. We agree with the 
<A 
HREF="http://www.eit.com/msgid/199602130050.QAA10031@labrador.mv.us.adobe.com">
Raman T.V. Initial Draft</A> that is very interesting to include Speech 
in the
CSS but we don't want to make it very complicated. Many people doesn't 
even know
decibels, most actual speech synthesizers are mono and it's easier to 
give values to
some features with numbers (in a more theoretical way, then this values 
will be 
mapped to the real values for each synthesizer).</P>
<P>We have defined the set of properties for Cascading Speech Style 
Sheets like in the
CSS1 Working draft:</P>
<P><H2>Speech</H2>
<UL><LI><B>Volume</B>
    	<BR>Value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
	<BR>Initial: 0
	<BR>Applies to: All elements
	<BR>Example: volume: 5

	<P>The reason why the default value is 0 is because normally 
there 
	will not be sound, but in the case that other value is specified 
        the speech syntetizer will start working. There are many sets of 
	values in the volume range (and all the other set of properties) 
	depending on which speech synthesizer you use, so theese theoretical
	values will be mapped into the real values used by the synthesizer.<P>
<LI><B>Speed</B>
        <BR>Value: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 | 10 |
        <BR>Initial: UA specific
        <BR>Applies to: All elements
        <BR>Example: speed: 6

	<P>Some users (specially between blind people) prefers very high 
	speed speech because they have a very good hearing so they could 
go 
	very fast reading web pages. That is the reason why we chose this 
        big range. Of course "speed: 0" is not allowed because you could 
	not hear anything. <P>
<LI><B>Voice-type</B>
        <BR>Value: | child1 | child2 | male1 | male2 | female1 | female2 |
        <BR>Initial: UA specific
        <BR>Applies to: All elements
        <BR>Example: voice-type: female1

	<P>This is the way to set the phisical features of the 
articulating 
	voice. For example the voice of a boy, a woman, a man, a terminator
	sounds different, and that is the reason.<P>
<LI><B>Pitch</B>
        <BR>Value: | 1 | 2 | 3 | 4 | 5 | 6 |
        <BR>Initial: UA specific
        <BR>Applies to: All elements
        <BR>Example: pitch: 4
    
        <P>This is a small range for the medium frequency (F0). The same 
	person (the same voice type) can talk (in media) more grave or 
	less, which gives the appearance to be a different voice. If we 
	try to combine "Pitch" and "voice-type" for example: 
	<BR>if voice-type=child1, F0=1 (low voice)--> real medium 
frequency:150Hz 
	<BR>if voice-type=child1, F0=6 (high voice)-> real medium 
frequency:350Hz 
	<BR>if voice-type=male2,  F0=1 (low voice)--> real medium 
frequency: 50Hz 
	<BR>if voice-type=male2,  F0=6 (high voice)-> real medium 
frequency:150Hz 
	
	<P>All this voices sounds different. We have a big range of different
	voices because F0 (Pitch frequency) is mapped to different values
	of real frequency depending on the voice-type. That's why 6 
possible 
	values of pitch are enough to make a simple definition.<P>
<LI><B>Prosidy</B>
        <BR>Value: | on | off |
        <BR>Initial: on
        <BR>Applies to: All elements
        <BR>Example: prosidy: off
	
	<P>With prosidy activated the synthesizer gives the entonation 
(the 
	evolution of F0 along the time) which will sound hard, soft, angry
	questionable..... If you have "prosidy:off" the result will be 
like the
	voice of a robot (blind people prefer this kind of voice and also 
	hearing very fast voice) <P>
<LI><B>Language</B>
        <BR>Value: defined in the ISO 639 (Codes for the representation 
of the 
        names of languages)
        <BR>Initial: en
        <BR>Applies to: All elements
        <BR>Example: language: fr

	<P>You can specify any language because the way to pronounce the 
same 
	message is different between countries (e.g. fr,nl,es,en....). 
	For example the Apollo II (multilingual speech syntesizer) 	
	supports 7 languages (russian, english, french, spanish...). The 
	default value is english because it's the most used language in 
	the web, and although many languages are not supported nor 
	perhaps will be in the future, it's better to include all than a 
	little part of them.<P>
</UL>	
	


<P>This is a DRAFT, we have discuss about it, and now is your turn to say 
if 
you like as it is, or you would like to talk about some features. I hope 
you will tell us what you think about it. Thank you!


<P></P><P></P><HR><ADDRESS>Kath. Universiteit Leuven--Dept.
Electrotechniek (ESAT), T.E.O.<A HREF="mailto:Juanjo.Miguez@kuleuven.ac.be">
Juanjo.Miguez@KULeuven.ac.be</A></ADDRESS></BODY></HTML>


----------------------------------------------------------------
Juan Jose Miguez Iglesias

Kath. Universiteit Leuven            | Phone : +32 16 32 18 66
Dept. Electrotechniek (ESAT), T.E.O. | 
Kard. Mercierlaan 94                 | Fax   : +32 16 32 19 86 
B-3001 LEUVEN - HEVERLEE  

Adress: Groenveldlaan 1, 8/107 ; Heverlee (Leuven) B-3001
Phone: 206185 or 235201

E-mail:Juanjo.Miguez@esat.kuleuven.ac.be
       jmiguez@ait.uvigo.es	
----------------------------------------------------------------

Received on Wednesday, 21 February 1996 10:38:13 UTC