RE: interpretation please, ssml from David.Pawson@rnib.org.uk on 2004-08-02 (www-voice@w3.org from July to September 2004)

From: <David.Pawson@rnib.org.uk>
Date: Mon, 2 Aug 2004 08:13:24 +0100
To: lordpixel@mac.com
Cc: www-voice@w3.org
Message-ID: <9B66BBD37D5DD411B8CE00508B69700F06FFE11E@pborolocal.rnib.org.uk>
 

    -----Original Message-----
    From: Andrew Thompson
    > Summary?
    >   No, don't expect good pronunciation for 'non-normal'
    > words such as caf&eacute; ?
    >
    > That seems to be the case, but I can't find it in the WD.
    
    It wouldn't really be practical to have requirements on 
    what a French word in the middle of otherwise English text 
    would sound like (to use your café example) because it'll 
    be synthesizer vendor specific.

And it doesn't say that either? I.e. a user (nor an implementor)
has no guidance on what to expect.
In this case I'd consider it an English word, but that's just my
opinion.

    
    Reasonably speaking, for French words that have been 
    adopted into English (resumé, café etc) then I'd expect 
    most synthesizers can handle these simple cases. In 
    particular I could see an English synth knowing how to 
    handle acute accents.

But that's an opinion, not a statement from the WD?

    
    I'd be a bit more surprised if your average English 
    synthesizer could handle "In Japanese, 'ありがとう' means 'thank 
    you'" randomly put in the middle of a sentence without any 
    markup to indicate the bit in the middle is lang="jp".

Likewise. I did state the contextual xml:lang was en.

    
    It all depends... perhaps the next generation of 
    synthesizers will understand Unicode and have voices 
    capable of pronouncing multiple languages enabled. Even 
    this in and of itself isn't enough to guarantee correct 
    output. After all, 本 could be Japanese or Chinese, there's 
    no particular way to tell without context, and this applies 
    to European languages too. Markup indicating the language 
    will always be necessary unless one day computers can 
    actually understand the meaning of what's being said.
    
    Actually I think the spec does answer your question:
    http://www.w3.org/TR/speech-synthesis/#AppF
    
Which is informative?

   
    And then it goes on to define how to improve the 
    pronunciation with an external lexicon.

Yes, thats a reasonable solution.

    
    As I said, it'll ultimately be vendor specific.

No, its undefined in the WD, which I now believe to be a weakness
easily addressed by the WG.

regards DaveP.

** snip here **



    

-- 
DISCLAIMER: 

NOTICE: The information contained in this email and any attachments is 
confidential and may be privileged. If you are not the intended 
recipient you should not use, disclose, distribute or copy any of the 
content of it or of any attachment; you are requested to notify the 
sender immediately of your receipt of the email and then to delete it 
and any attachments from your system. 

RNIB endeavours to ensure that emails and any attachments generated by 
its staff are free from viruses or other contaminants. However, it 
cannot accept any responsibility for any  such which are transmitted.
We therefore recommend you scan all attachments. 

Please note that the statements and views expressed in this email and 
any attachments are those of the author and do not necessarily represent 
those of RNIB. 

RNIB Registered Charity Number: 226227 

Website: http://www.rnib.org.uk
Received on Monday, 2 August 2004 03:14:35 UTC