- From: Al Gilman <Alfred.S.Gilman@IEEE.org>
- Date: Wed, 15 Mar 2006 13:57:36 -0500
- To: www-voice@w3.org
- Cc: jbrewer@w3.org
[Commenting on version: http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/] 1. Provide better discrimination in determining pronunciation preference. The specification provides for one, static 'preferred' pronunciation [2] for a lexeme, which may have multiple graphemes associated with it but none of them are at all aware of markup in the SRGS or SSML documents that are being processed. [2] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S4.6 1.1 problems with this situation This limitation, which means that homographs cannot be given any sort of pronunciation selectivity, should not be accepted. [3] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S5.5 a. It defeats the use of the Pronunciation Lexicon Specification in the production to audio media of talking books [4]. This is an important use case for access to information by people with disabilities, print disabilities in this case. [4] http://lists.w3.org/Archives/Public/www-voice/2001JanMar/0020.html b. It defeats the intended interoperability of lexicons between ASR and TTS functions [5]. lexicons will serve ASR best with lots of pronunciations, and TTS best with few, unless the many pronunciations can be marked up as to when to use which. [5] http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20060131/#S1.3 c. It fails to interoperate with the intelligence already in SSML in the say-as element [6]. [6] http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/ While many functional limitations have been incorporated in the Voice Browser specifications in order to reach a platform of well-supported common markup, it does not seem to make sense to have say-as capability in SSML with QName ability to indicate terms defined outside the corpus of Voice Browser specifications, and not use this information in determining which pronunciation is preferred when. 1.2 opportunities to do better As suggested above, there would seem to be a ready path to resolving homographs and other preferred-pronunciation subtleties by use of the say-as element and its interpret-as attribute in SSML to distinguish cases where the preferred pronunciation was one way or another. 1.2.1 Allow markup in <grapheme> One way to do this would be to allow <say-as> markup inside the <grapheme> element wrapping the plain text of the token being pronounced. 1.2.2 XPath selectors A second, probably better way, would be to use XPath selectors to distinguish the cases where one pronunciation is preferred as opposed to another. This markup would closely resemble the use of XPath selectors in DISelect [7]. [7] http://www.w3.org/2001/di/Group/di-selection/ In either case, the value of ssml:say-as.interpret-as could be used as a discriminant in choosing preferred pronunciations. This value in turn can, as a best practice, be reliably tied to semantic information which is precise enough to assure a single appropriate pronunciation. There are more complicated approaches that could be integrated using SPARQL queries of the <metadata> contents, but a little XPath processing of guard expressions is so readily achievable that it is hard to believe something should not be done to afford this capability. The QName value of this attribute allows plenty of extension room to create unique keys for the proper names of individual people, along with the ability to refer to WordNet nodes or dictionary entries for pronunciation variants of homographs. Al
Received on Wednesday, 15 March 2006 18:57:53 UTC