W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2004

Re: interpretation please, ssml

From: Andrew Thompson <lordpixel@mac.com>
Date: Sat, 31 Jul 2004 11:19:13 -0400
Message-Id: <FAB8FB82-E304-11D8-9D1D-000A27D7D9DC@mac.com>
Cc: www-voice@w3.org
To: David.Pawson@rnib.org.uk

Hi David,

> Summary?
>   No, don't expect good pronunciation for 'non-normal'
> words such as caf&eacute; ?
>
> That seems to be the case, but I can't find it in the WD.

It wouldn't really be practical to have requirements on what a French 
word in the middle of otherwise English text would sound like (to use 
your café example) because it'll be synthesizer vendor specific.

Reasonably speaking, for French words that have been adopted into 
English (resumé, café etc) then I'd expect most synthesizers can handle 
these simple cases. In particular I could see an English synth knowing 
how to handle acute accents.

I'd be a bit more surprised if your average English synthesizer could 
handle "In Japanese, 'ありがとう' means 'thank you'" randomly put in the 
middle of a sentence without any markup to indicate the bit in the 
middle is lang="jp".

It all depends... perhaps the next generation of synthesizers will 
understand Unicode and have voices capable of pronouncing multiple 
languages enabled. Even this in and of itself isn't enough to guarantee 
correct output. After all, 本 could be Japanese or Chinese, there's no 
particular way to tell without context, and this applies to European 
languages too. Markup indicating the language will always be necessary 
unless one day computers can actually understand the meaning of what's 
being said.

Actually I think the spec does answer your question:
http://www.w3.org/TR/speech-synthesis/#AppF

The third example says:
"It is often the case that an author wishes to include a bit of foreign 
text (say, a movie title) in an application without having to switch 
languages (for example via the voice element). A simple way to do this 
is shown here. In this example the synthesis processor would render the 
movie name using the pronunciation rules of the container language 
("en-US" in this case), similar to how a reader who doesn't know the 
foreign language might try to read (and pronounce) it."

And then it goes on to define how to improve the pronunciation with an 
external lexicon.

As I said, it'll ultimately be vendor specific. I wouldn't be surprised 
if some English synths can handle simple French etc, but if you don't 
want to explicitly mark up the French as French or use a lexicon, 
you'll have to test on the synth you're actually targeting.

On Jul 30, 2004, at 9:09 AM, David.Pawson@rnib.org.uk wrote:

AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside

         (see you later space cowboy ...)
Received on Saturday, 31 July 2004 11:19:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:14:26 UTC