Re: [pls] Example multiple lexemes with same grapheme content

Hi,

Apologies if I've missed something, but this WG discussion is what I was
getting at in my recent post.

PLS needs to account for the ability of TTS engines to disambiguate
heterosyntactic homographs using part-of-speech information, either derived
from the text during the TTS process or specified in mark-up. The least
disruptive way to achieve this would be to permit part-of-speech
designations.

Simple example (using SAPI-style POS, SAMPA phonemes):

<lexeme pos="noun">
   <grapheme>record</grapheme>
   <phoneme>rekO:d</phoneme>
</lexeme>
<lexeme pos="verb">
   <grapheme>record</grapheme>
   <phoneme>r@kO:d</phoneme>
</lexeme>

Complex example (using Penn Treebank-style POS):

<lexeme poslist="vb nn nnp vbp">
   <grapheme>read</grapheme>
   <phoneme>ri:d</phoneme>
</lexeme>
<lexeme poslist="vbn vbd">
   <grapheme>read</grapheme>
   <phoneme>red</phoneme>
</lexeme>

Some kind of priority mechanism is still required for homosyntactic
homographs; to follow the VXML precedent, document order would be the
obvious way to do that.

Regards,
Peter Moffatt

Received on Thursday, 24 March 2005 15:24:06 UTC