- From: Nick Levinson <nick_levinson@yahoo.com>
- Date: Mon, 5 Nov 2018 22:47:07 +0000 (UTC)
- To: "www-voice@w3.org" <www-voice@w3.org>
- Message-ID: <922125655.1347233.1541458027325@mail.yahoo.com>
Probably grapheme elements for the Pronunciation Lexicon Specification are more often written for strings that are whitespace-surrounded in the original source code than are spacelessly embedded in longer strings (other kinds of original strings being less common). This issue, of embedding, is addressed in appendix C but only tentatively ("'don't' can be explicitly tokenized as 'do' and 'n't' in order to match a <grapheme> element with content 'n't'", "a lexeme for 'do' should not match the beginning of 'done'", and "a lexeme for 'they'll' should have precedence over a lexeme for 'they' given the input 'they'll'"). While recommendations are helpful, the fragmentation of the TTS market and, in my experience, the lesser reliability of older TTS products (I don't know about ASR or newer TTS) means we have to program for a lower common denominator. Thus, a crisper standard is needed. Example: For "fuse" in the page source code where *.pls specifies only that the pronunciation of "use" is /yousse/ and not /youze/, TTS probably should not pronounce "fuse" because of *.pls as /fyousse/. I propose adding as a paragraph into section 4.5 between the present second and third paragraphs: "The content of the <grapheme> element must match only a string that is, at each end, immediately adjacent to white space, a digit, punctuation, or the beginning or end of the line being matched unless the <grapheme> element has an attribute named "fragment" with a value of "yes" ("no" or any other value being equivalent to omission of the attribute)." -- Nick
Received on Monday, 5 November 2018 22:47:32 UTC