W3C home > Mailing lists > Public > www-voice@w3.org > January to March 2005

Re: [pls] Example multiple lexemes with same grapheme content

From: Peter Moffatt <peter.moffatt@nortel.com>
Date: Thu, 24 Mar 2005 16:22:54 +0100
Message-ID: <8F20221FB47FD51190AD00508BCF36BA054718F1@znsgy0k3.europe.nortel.com>
To: "'www-voice@w3.org'" <www-voice@w3.org>
Hi,

Apologies if I've missed something, but this WG discussion is what I was
getting at in my recent post.

PLS needs to account for the ability of TTS engines to disambiguate
heterosyntactic homographs using part-of-speech information, either derived
from the text during the TTS process or specified in mark-up. The least
disruptive way to achieve this would be to permit part-of-speech
designations.

Simple example (using SAPI-style POS, SAMPA phonemes):

<lexeme pos="noun">
   <grapheme>record</grapheme>
   <phoneme>rekO:d</phoneme>
</lexeme>
<lexeme pos="verb">
   <grapheme>record</grapheme>
   <phoneme>r@kO:d</phoneme>
</lexeme>

Complex example (using Penn Treebank-style POS):

<lexeme poslist="vb nn nnp vbp">
   <grapheme>read</grapheme>
   <phoneme>ri:d</phoneme>
</lexeme>
<lexeme poslist="vbn vbd">
   <grapheme>read</grapheme>
   <phoneme>red</phoneme>
</lexeme>

Some kind of priority mechanism is still required for homosyntactic
homographs; to follow the VXML precedent, document order would be the
obvious way to do that.

Regards,
Peter Moffatt
Received on Thursday, 24 March 2005 15:24:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:49:01 GMT