- From: Daniel C. Burnett <Daniel.Burnett@nuance.com>
- Date: Tue, 3 Jul 2007 12:53:34 -0400
- To: "Addison Phillips" <addison@yahoo-inc.com>
- Cc: "Richard Ishida" <ishida@w3.org>, <shuangzw@cn.ibm.com>, "Kazuyuki Ashimura" <ashimura@w3.org>, <public-i18n-core@w3.org>
Thanks Addison. I appreciate the quick comments -- we'll look over them in our meeting this week. -- dan -----Original Message----- From: Addison Phillips [mailto:addison@yahoo-inc.com] Sent: Tuesday, July 03, 2007 12:50 PM To: Daniel C. Burnett Cc: Richard Ishida; shuangzw@cn.ibm.com; Kazuyuki Ashimura; public-i18n-core@w3.org Subject: Re: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published A few comments on Richard's. Note that these are personal comments and not I18N Core WG comments. > > ============ > 3.1.2 xml:lang attribute > http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20 > 0706 > 11diff.html#S3.1.2 > > I suggest: s/to indicate the natural language of the content of the > element/to indicate the natural language of the written content of the > element/ Language identifiers are not limited to written content (although these elements will contain written content, no?) > > I'm thinking it would be useful to say, specifically, that values must > conform to BCP 47. Rather than the, to me, slightly weak sounding "BCP > 47 > can help in understanding how to use this attribute". +1 > > > ================ > 3.1.8.2 w element > http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20 > 0706 > 11diff.html#S3.1.8.2 > ... > > I suggest: s/that do not use white-space as a boundary identifier/that > do > not use white-space as a token boundary identifier/ > > Note also that Thai does use space as a boundary identifier, but those > boundaries are phrasal rather than token level. That is, "words" (tokens) are not necessarily separated by spaces. > > > Chinese is a little unusual wrt language tags. > ... > > Of course the examples that follow seem to indicate that this would > actually > need to be Shanghaiese, for which the subtag is zh-wuu. Unfortunately, > there is no provision at the moment for zh-wuu-Hans, although that is > coming > in the next version of BCP 47. Due Real Soon Now. If you need a non-Mandarin example, Cantonese (which is the dialect spoken in e.g. Hong Kong) would probably be a better choice (the subtag for Cantonese is 'yue', i.e. "zh-yue-Hant", etc.). Almost certainly you will want to distinguish written and spoken forms. The written forms for the various Chinese languages/dialects are (nearly) indistinguishable. The variation is between the Traditional and Simplified scripts (Hant vs. Hans script subtags). When rendering written Chinese into a spoken form, however, you need to know which dialect is being used (it makes a major difference!!). Hence the need for additional subtags. A word of caution. While there are some grandfathered tags such as "zh-cmn-Hans" currently extant, there is also some debate about whether this will ultimately be the form used for the Chinese dialects. It is possible that some or all of the Chinese dialects will end up being represented by their (naked) language codes. Thus you might see "cmn-Hans", "yue-Hant", and "wuu-Hans" as valid tags. (This is an open issue and currently opinion is running the other way, towards preserving the "zh-" as a prefix to each of these) I guess what I'm suggesting is that be cautious with your Chinese examples (give them as examples using extant grandfathered tags, to be sure, but avoid trying to give normative guidance for now). > > If we have <voice languages="fr:zh"> and there is no voice that supports > French with a Chinese accent, then presumably a voice that supports > French > will be a suitable fallback? If so, you should probably say that in the > onvoicefailure section. I would add: you should probably specify the matching algorithm used. See RFC 4647 (part of BCP 47). For this type of matching, the Lookup algorithm is often a good choice to specify. The current text is too vague, hence the remainder of Richard's comment (mostly omitted here). > > > The example on purple background says <voice gender="female" > languages="en-US" ... rather than <voice gender="female" > languages="en:en-US" ... > > Is this a mistake, or does it mean that accent should be specified with > a > single language tag where possible, and that the colon separator is only > needed for accents that are not expressible in that way, eg. en:zh? ... or does this mean that the "languages" attribute is a "language priority list" (see RFC 4647)?? Best Regards, Addison -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG C0-Editor -- IETF BCP 47 [RFC 4646, RFC 4647] Internationalization is an architecture. It is not a feature.
Received on Tuesday, 3 July 2007 16:53:43 UTC