comment for PLS Last Call

[Commenting on version:]

1. Provide better discrimination in determining pronunciation preference.

The specification provides for one, static 'preferred' pronunciation
[2] for a lexeme, which may have multiple graphemes associated with
it but none of them are at all aware of markup in the SRGS or SSML
documents that are being processed.


1.1 problems with this situation

This limitation, which means that homographs cannot be given any sort
of pronunciation selectivity, should not be accepted.


a. It defeats the use of the Pronunciation Lexicon Specification in
the production to audio media of talking books [4]. This is an
important use case for access to information by people with
disabilities, print disabilities in this case.


b. It defeats the intended interoperability of lexicons between ASR
and TTS functions [5]. lexicons will serve ASR best with lots of
pronunciations, and TTS best with few, unless the many pronunciations
can be marked up as to when to use which.


c. It fails to interoperate with the intelligence already in SSML in
the say-as element [6].


While many functional limitations have been incorporated in the Voice
Browser specifications in order to reach a platform of well-supported
common markup, it does not seem to make sense to have say-as
capability in SSML with QName ability to indicate terms defined
outside the corpus of Voice Browser specifications, and not use this
information in determining which pronunciation is preferred when.

1.2 opportunities to do better

As suggested above, there would seem to be a ready path to resolving
homographs and other preferred-pronunciation subtleties by use of the
say-as element and its interpret-as attribute in SSML to distinguish
cases where the preferred pronunciation was one way or another.

1.2.1 Allow markup in <grapheme>

One way to do this would be to allow <say-as> markup inside the
<grapheme> element wrapping the plain text of the token being

1.2.2 XPath selectors

A second, probably better way, would be to use XPath selectors to
distinguish the cases where one pronunciation is preferred as opposed
to another. This markup would closely resemble the use of XPath
selectors in DISelect [7].


In either case, the value of ssml:say-as.interpret-as could be used
as a discriminant in choosing preferred pronunciations. This value in
turn can, as a best practice, be reliably tied to semantic
information which is precise enough to assure a single appropriate

There are more complicated approaches that could be integrated using
SPARQL queries of the <metadata> contents, but a little XPath
processing of guard expressions is so readily achievable that it is
hard to believe something should not be done to afford this

The QName value of this attribute allows plenty of extension room to
create unique keys for the proper names of individual people, along
with the ability to refer to WordNet nodes or dictionary entries for
pronunciation variants of homographs.


Received on Wednesday, 15 March 2006 18:57:53 UTC