RE: mark's and richard's comments on SSML from Walker, Mark R on 2001-01-22 (www-voice@w3.org from January to March 2001)

From: Walker, Mark R <mark.r.walker@intel.com>
Date: Mon, 22 Jan 2001 12:58:32 -0800
To: "'Alex.Monaghan@Aculab.com'" <Alex.Monaghan@Aculab.com>
Cc: www-voice@w3.org
Message-ID: <638EC1B28663D211AC3E00A0C96B78A8042B39E8@orsmsx40.jf.intel.com>

Alex -

On the specific question of how to specify the conforming behavior of the
'break' element that you originally cited, I was attempting distinguish
between the specification-conformant local behavior and the potentially
variable prosodic context behavior.

The required perceptual result on the local interval of specifying something
like <break time="250ms"/> I am certain is described in fairly unambiguous
terms in section 2.8.  What is not specified is the impact on the larger
prosodic context.  In your original example, if a markup author (unwisely)
chose to insert a break in a region surrounded by un-marked breaks, the
rendering synthesizer might elect to optimize the perceptual quality by
'balancing' the effects of the markup on the other within-context break
intervals, according to some internal model.  Another synthesizer might
elect to render the markup 'as is', even if a potential break in quality was
internally flagged.  Both behaviors emerge from a context larger than that
explictly controlled by the markup, and both would be conformant.  

The strategy for maintaining similar rendering performance across disparate
systems would therefore fall to the markup author.  Text sections where
rendering performance might be expected to vary could for example, be
replaced by a string of low-level elements that largely resolved the
ambiguity. 

The question of how a given developer engineers an SSML-conformant speech
synthesis engine then hinges on the clarity of the written descriptions of
the prescribed, local perceptual impact of each of the markup elements
contained in the specification.  It is on this question I am most anxious to
receive feedback from potential users.  It may be that some of the
descriptions are not sufficiently clear, but frankly, based on our prior
communications, I don't see the <break/> element description as being one of
those. 


-Regards,

Mark







-----Original Message-----
From: Alex.Monaghan@Aculab.com [mailto:Alex.Monaghan@Aculab.com]
Sent: Monday, January 22, 2001 5:32 AM
To: www-voice@w3.org
Subject: RE: mark's and richard's comments on SSML


i know i was only taking one possible interpretation of what richard wrote,
but it certainly seems as though the SSML spec will not be satisfied by most
curent synthesisers if the requirement for appropriate output is part of the
definition of compliance.

in other words, either the goal of cross-platform consistency is sacrificed
or the goal of implementation using current technology is abandoned.

richard appears to attach more importance to cross-platform consistency, as
do i - what's the point of having a mark-up standard if the results
(synthesiser outputs) are not standardised? it would be analogous to having
a standard for fuel which stated that you had to be able to pour into into a
fuel tank, but said nothing about what happened after that.

so how will compliance be assessed?
					alex.

> -----Original Message-----
> From:	Richard Sproat [SMTP:rws@research.att.com]
> Sent:	22 January 2001 13:22
> To:	Alex.Monaghan@Aculab.com; www-voice@w3.org
> Subject:	Re: mark's and richard's comments on SSML
> 
> 
> Alex:
> 
>   Richard: "in the current situation what you have is a
>   system that will not necessarily be able to implement what you want to
>   hear."
> 
> Here I'm describing the situation that one is likely to have with
> certain classes of synthesizers. I am not claiming that this would
> constitute an acceptable notion of compliance. Quite the opposite: I
> think the situation is perfectly unacceptable. I had thought that was
> clear, but maybe I should have spelled this out explicitly.
> 
> --R

Received on Monday, 22 January 2001 15:59:02 UTC