- From: Walker, Mark R <mark.r.walker@intel.com>
- Date: Mon, 22 Jan 2001 12:58:32 -0800
- To: "'Alex.Monaghan@Aculab.com'" <Alex.Monaghan@Aculab.com>
- Cc: www-voice@w3.org
Alex - On the specific question of how to specify the conforming behavior of the 'break' element that you originally cited, I was attempting distinguish between the specification-conformant local behavior and the potentially variable prosodic context behavior. The required perceptual result on the local interval of specifying something like <break time="250ms"/> I am certain is described in fairly unambiguous terms in section 2.8. What is not specified is the impact on the larger prosodic context. In your original example, if a markup author (unwisely) chose to insert a break in a region surrounded by un-marked breaks, the rendering synthesizer might elect to optimize the perceptual quality by 'balancing' the effects of the markup on the other within-context break intervals, according to some internal model. Another synthesizer might elect to render the markup 'as is', even if a potential break in quality was internally flagged. Both behaviors emerge from a context larger than that explictly controlled by the markup, and both would be conformant. The strategy for maintaining similar rendering performance across disparate systems would therefore fall to the markup author. Text sections where rendering performance might be expected to vary could for example, be replaced by a string of low-level elements that largely resolved the ambiguity. The question of how a given developer engineers an SSML-conformant speech synthesis engine then hinges on the clarity of the written descriptions of the prescribed, local perceptual impact of each of the markup elements contained in the specification. It is on this question I am most anxious to receive feedback from potential users. It may be that some of the descriptions are not sufficiently clear, but frankly, based on our prior communications, I don't see the <break/> element description as being one of those. -Regards, Mark -----Original Message----- From: Alex.Monaghan@Aculab.com [mailto:Alex.Monaghan@Aculab.com] Sent: Monday, January 22, 2001 5:32 AM To: www-voice@w3.org Subject: RE: mark's and richard's comments on SSML i know i was only taking one possible interpretation of what richard wrote, but it certainly seems as though the SSML spec will not be satisfied by most curent synthesisers if the requirement for appropriate output is part of the definition of compliance. in other words, either the goal of cross-platform consistency is sacrificed or the goal of implementation using current technology is abandoned. richard appears to attach more importance to cross-platform consistency, as do i - what's the point of having a mark-up standard if the results (synthesiser outputs) are not standardised? it would be analogous to having a standard for fuel which stated that you had to be able to pour into into a fuel tank, but said nothing about what happened after that. so how will compliance be assessed? alex. > -----Original Message----- > From: Richard Sproat [SMTP:rws@research.att.com] > Sent: 22 January 2001 13:22 > To: Alex.Monaghan@Aculab.com; www-voice@w3.org > Subject: Re: mark's and richard's comments on SSML > > > Alex: > > Richard: "in the current situation what you have is a > system that will not necessarily be able to implement what you want to > hear." > > Here I'm describing the situation that one is likely to have with > certain classes of synthesizers. I am not claiming that this would > constitute an acceptable notion of compliance. Quite the opposite: I > think the situation is perfectly unacceptable. I had thought that was > clear, but maybe I should have spelled this out explicitly. > > --R
Received on Monday, 22 January 2001 15:59:02 UTC