RE: mark's and richard's comments on SSML

The 'emphasis' element is unquestionably the least described element in the
specification, and I frankly don't see a near-term solution to this.  

Emphasis of an isolated word in human speech is mostly the result of an
increase in the word duration, usually accompanied by an increase in the
duration of the pause intervals surrounding the word.  Sometimes a small
increase in volume and alteration of the intonation contour are also
included. There may also be a relationship with vocal 'effort', which may
(or may not) correspond to short-term spectral tilt.  The analytic
underpinnings of phrase emphasis is even less understood.

So, while a generally accepted model of emphasis does not currently exist, I
believe it is still true that a knowledgeable, motivated synthetic speech
engine development team would be able to generate an 'emphasis' function for
the engine that basically accomplished the goal of perceptually
'highlighting' a word or a phrase. The specification really does not (and
could not) require more.

I think however, that you raise an important issue.  By not providing more
implementation guidance, we may be placing smaller companies interested in
adopting SSML at too great a disadvantage.  The insurmountable problem, as I
see it, is that in the absence of a general concensus on 'best known
methods' for TTS within the industry (formant synthesis vs concatenation,
etc), any implementation guidance would likely be highly biased in favor of
one specific approach, and thus would not be very useful.  I remain open to
proposals in this area, however.

-Mark






-----Original Message-----
From: Alex.Monaghan@Aculab.com [mailto:Alex.Monaghan@Aculab.com]
Sent: Tuesday, January 23, 2001 5:50 AM
To: mark.r.walker@intel.com
Cc: www-voice@w3.org
Subject: RE: mark's and richard's comments on SSML


mark,
thanks for your replies. i agree that the <break> element is quite clearly
defined. i have looked at section 5.3, and the phrase "correctly understand
and apply" seems to be the crucial one.
how, in your view, should a system "correctly understand and apply" an
<emphasis> element?
	
alex.

Received on Tuesday, 23 January 2001 14:38:07 UTC