Re: [pls] Example multiple lexemes with same grapheme content from Peter Moffatt on 2005-03-24 (www-voice@w3.org from January to March 2005)

From: Peter Moffatt <peter.moffatt@nortel.com>
Date: Thu, 24 Mar 2005 16:22:54 +0100
To: "'www-voice@w3.org'" <www-voice@w3.org>
Message-ID: <8F20221FB47FD51190AD00508BCF36BA054718F1@znsgy0k3.europe.nortel.com>

Hi,

Apologies if I've missed something, but this WG discussion is what I was
getting at in my recent post.

PLS needs to account for the ability of TTS engines to disambiguate
heterosyntactic homographs using part-of-speech information, either derived
from the text during the TTS process or specified in mark-up. The least
disruptive way to achieve this would be to permit part-of-speech
designations.

Simple example (using SAPI-style POS, SAMPA phonemes):

<lexeme pos="noun">
   <grapheme>record</grapheme>
   <phoneme>rekO:d</phoneme>
</lexeme>
<lexeme pos="verb">
   <grapheme>record</grapheme>
   <phoneme>r@kO:d</phoneme>
</lexeme>

Complex example (using Penn Treebank-style POS):

<lexeme poslist="vb nn nnp vbp">
   <grapheme>read</grapheme>
   <phoneme>ri:d</phoneme>
</lexeme>
<lexeme poslist="vbn vbd">
   <grapheme>read</grapheme>
   <phoneme>red</phoneme>
</lexeme>

Some kind of priority mechanism is still required for homosyntactic
homographs; to follow the VXML precedent, document order would be the
obvious way to do that.

Regards,
Peter Moffatt

Received on Thursday, 24 March 2005 15:24:06 UTC