- From: Walker, Mark R <mark.r.walker@intel.com>
- Date: Tue, 23 Jan 2001 11:37:54 -0800
- To: "'Alex.Monaghan@Aculab.com'" <Alex.Monaghan@Aculab.com>
- Cc: www-voice@w3.org, "Walker, Mark R" <mark.r.walker@intel.com>
The 'emphasis' element is unquestionably the least described element in the specification, and I frankly don't see a near-term solution to this. Emphasis of an isolated word in human speech is mostly the result of an increase in the word duration, usually accompanied by an increase in the duration of the pause intervals surrounding the word. Sometimes a small increase in volume and alteration of the intonation contour are also included. There may also be a relationship with vocal 'effort', which may (or may not) correspond to short-term spectral tilt. The analytic underpinnings of phrase emphasis is even less understood. So, while a generally accepted model of emphasis does not currently exist, I believe it is still true that a knowledgeable, motivated synthetic speech engine development team would be able to generate an 'emphasis' function for the engine that basically accomplished the goal of perceptually 'highlighting' a word or a phrase. The specification really does not (and could not) require more. I think however, that you raise an important issue. By not providing more implementation guidance, we may be placing smaller companies interested in adopting SSML at too great a disadvantage. The insurmountable problem, as I see it, is that in the absence of a general concensus on 'best known methods' for TTS within the industry (formant synthesis vs concatenation, etc), any implementation guidance would likely be highly biased in favor of one specific approach, and thus would not be very useful. I remain open to proposals in this area, however. -Mark -----Original Message----- From: Alex.Monaghan@Aculab.com [mailto:Alex.Monaghan@Aculab.com] Sent: Tuesday, January 23, 2001 5:50 AM To: mark.r.walker@intel.com Cc: www-voice@w3.org Subject: RE: mark's and richard's comments on SSML mark, thanks for your replies. i agree that the <break> element is quite clearly defined. i have looked at section 5.3, and the phrase "correctly understand and apply" seems to be the crucial one. how, in your view, should a system "correctly understand and apply" an <emphasis> element? alex.
Received on Tuesday, 23 January 2001 14:38:07 UTC