- From: Raman T. V. <raman@mv.us.adobe.com>
- Date: Wed, 28 Feb 1996 08:58:40 -0800
- To: JuanJo Miguez <JuanJo.Miguez@esat.kuleuven.ac.be>
- Cc: www-style@w3.org, raman@mv.us.adobe.com
I'm sorry, but this proposal from Europe is a *joke*! (apologies if I sound rude --that is not the intent) We're talking of a style sheet specification --not a speech synthesizer. I'm completely bemused by the assertion at the end that states "not many people can afford expensive devices so we are making a simple one" JuanJo Miguez writes: > T.E.O.'s Draft--Cascading Speech Style Sheets > K.U. Leuven > > > Ing. to be Juan Jose Miguez Iglesias mailto:Juanjo.Miguez@KULeuven.ac.be > ir. Filip Evenepoel mailto:Filip.Evenepoel@KULeuven.ac.be > ir. Bart BAwens mailto:Bart.Bauwens@KULeuven.ac.be > Prof.dr.ir Jan Engelen mailto:Jan.Engelen@KULeuven.ac.be > Prof.ing Antonio S. Pena from the E.T.S.I.Telecomunication of Vigo (Spain) > > > A SIMPLE DEFINITION > ------------------- > > The T.E.O. group at the Katholique University of Leuven in Belgium > believe that the best way to include Speech within the CSS is to make it > simple and general, so that it's easy to use. We agree with the Raman T.V. > Initial Draft: > > (http://www.eit.com/msgid/199602130050.QAA10031@labrador.mv.us.adobe.com) > > that is very interesting to include Speech in the CSS but we don't want > to make it very complicated. Many people doesn't even know decibels, most > actual speech synthesizers are mono and it's easier to give values to > some features with numbers (in a more theoretical way, then this values > will be mapped to the real values for each synthesizer). You can see this > page with your browser in HTML in the URL: > > http://www.esat.kuleuven.ac.be/~juanjo/csss1.html > > We have defined the set of properties for Cascading Speech Style Sheets > like in the CSS1 Working draft: > > Speech > ------ > Volume > Value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | > Initial: 0 > Applies to: All elements > Example: volume: 5 > > The reason why the default value is 0 is because normally there > will not be sound, but in the case that other value is specified > the speech syntetizer will start working. There are many sets of > values in the volume range (and all the other set of properties) > depending on which speech synthesizer you use, so theese theoretical > values will be mapped into the real values used by the synthesizer. > > We think this way is easier than Raman's one, where the user > should know to make his own style sheet how what decibels are. In > fact really few people know about this (engineers, Physics and so on). > To make it easy we let people decide between a set of ten values > that will be mapped by expert people to the real values in the > synthesizer. > > Speed > Value: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 | 10 | > Initial: UA specific > Applies to: All elements > Example: speed: 6 > > Some users (specially between blind people) prefers very high > speed speech because they have a very good hearing so they could > go very fast reading web pages. That is the reason why we chose this > big range. Of course "speed: 0" is not allowed because you could > not hear anything. > > Voice-type > Value: | child1 | child2 | male1 | male2 | female1 | female2 | > Initial: UA specific > Applies to: All elements > Example: voice-type: female1 > > This is the way to set the phisical features of the articulating > voice. For example the voice of a boy, a woman, a man, sounds > different, and that is the reason. > > Pitch > Value: | 1 | 2 | 3 | 4 | 5 | 6 | > Initial: UA specific > Applies to: All elements > Example: pitch: 4 > > This is a small range for the medium frequency (F0). The same > person (the same voice type) can talk (in media) more grave or > less, which gives the appearance to be a different voice. If we > try to combine "Pitch" and "voice-type" for example: > > if voice-type=child1,F0=1 (low voice)--> real medium frequency:150Hz > if voice-type=child1,F0=6 (high voice)-> real medium frequency:350Hz > if voice-type=male2, F0=1 (low voice)--> real medium frequency: 50Hz > if voice-type=male2, F0=6 (high voice)-> real medium frequency:150Hz > > All this voices sounds different. We have a big range of different > voices because F0 (Pitch frequency) is mapped to different values > of real frequency depending on the voice-type. That's why 6 > possible values of pitch are enough to make a simple definition with > 36 different voices. > > When a user wants to write his personal CSSS, he can try any of the > available values, and it will work because they will be mapped to real > and typical values. With Raman's specification someone could try with > an average-pitch of 5 Hertzs, but it will sound bad. We prefer to let > people choose a relative number than an exact and perhaps wrong number > of average pitch. > > Prosidy > Value: | on | off | > Initial: on > Applies to: All elements > Example: prosidy: off > > With prosidy activated the synthesizer gives the entonation (the > evolution of F0 along the time) which will sound hard, soft, angry > questionable..... If you have "prosidy:off" the result will be > like the voice of a robot (blind people prefer this kind of voice > and also hearing very fast voice) > > Language > Value: defined in the ISO 639 (Codes for the representation of > the names of languages) > Initial: en > Applies to: All elements > Example: language: fr > > You can specify any language because the way to pronounce the same > message is different between countries (e.g. fr,nl,es,en....). > For example the Apollo II (multilingual speech syntesizer) > supports 7 languages (russian, english, french, spanish...). The > default value is english because it's the most used language in > the web, and although many languages are not supported nor > perhaps will be in the future, it's better to include all than a > little part of them. > > We try to make understandable speech, but we think that it's > difficult to make a speech synthesizer speaking in all the dialects > of all the world's countries, as Raman suggests in his draft. It > could be possible, but not many people could afford it. We are just > thinking to make easy for the final user and with the devices that > are now mostly used, so that this could be working soon because there > are many people that needs it very much as soon as possible (blind or > impaired people) > > This is a DRAFT, we have discussed about it, and now is your turn to say if > you like as it is, or you would like to talk about some features. I hope > you will tell us what you think about it. Thank you! > > > > Kath. Universiteit Leuven--Dept.Electrotechniek (ESAT), T.E.O. > mailto:Juanjo.Miguez@KULeuven.ac.be > ---------------------------------------------------------------- -- Best Regards, ____________________________________________________________________________ --raman Adobe Systems Tel: 1 (415) 962 3945 (B-1 115) Advanced Technology Group Fax: 1 (415) 962 6063 1585 Charleston Road Email: raman@adobe.com Mountain View, CA 94039 -7900 raman@cs.cornell.edu http://www-atg/People/Raman.html (Internal To Adobe) http://www.cs.cornell.edu/Info/People/raman/raman.html (Cornell) Disclaimer: The opinions expressed are my own and in no way should be taken as representative of my employer, Adobe Systems Inc. ____________________________________________________________________________
Received on Wednesday, 28 February 1996 11:58:43 UTC