RE: [SSML11] i18n comment 4: zh-CN-HK

Hi Dan,

I have the action item. There is some dispute right now about handling tagging of Chinese languages--especially in an audio context--so I'm hesitant to give a Chinese example. However, once the dust settles it would be good to provide one in your document in particular.

That said, I've just looked at the text in question, which reads:

--
For example, a languages value of "en:zh fr:de" can legally be matched by any voice that can both read English (speaking it with a Chinese accent) and read French (speaking it with a German accent). Thus, a voice that only supports "en-US" with a "zh-CN-HK" accent and "fr-CA" with a "de-SU" accent would match. As another example, if we have <voice languages="fr:zh"> and there is no voice that supports French with a Chinese accent, then a voice selection failure will occur. Note that if no accent indication is given for a language, then any voice that speaks the language is acceptable, regardless of accent. Also, note that author control over language support during voice selection is independent of any value of xml:lang in the text.
--

To make it both correct and non-controversial requires only a minor change. I would suggest changing it thus:

--
For example, a languages value of "en:zh fr:de" can legally be matched by any voice that can both read English (speaking it with a Chinese accent) and read French (speaking it with a German accent). Thus, a voice that only supports "en-US" with a "zh-HK" accent and "fr-CA" with a "de-AT" accent would match. As another example, if we have <voice languages="fr:zh"> and there is no voice that supports French with a Chinese accent, then a voice selection failure will occur. Note that if no accent indication is given for a language, then any voice that speaks the language is acceptable, regardless of accent. Also, note that author control over language support during voice selection is independent of any value of xml:lang in the text.
--

That is: s/zh-CN-HK/zh-HK/ and s/de-SU/de-AT/

The tag "zh-CN-HK" is, as noted, illegal. The tag "zh-HK" means "Chinese as used in Hong Kong SAR" (note that this suggests but does not specify a "Cantonese" accent). The tag "de-SU" would be "German as used in the former Soviet Union", which is possible (see: Kalingrad), but also extremely unlikely. The 'AT' subtag represents Austria.

It should be noted that the current debate about tagging Chinese partially revolves around the fact that spoken Chinese languages/dialects, while all being "Chinese", are not all mutually intelligible. The debate is whether language tags should take the form of "zh-(something)" (indicating the relationship to Chinese) or just use their specific language subtags directly (such as 'yue' for Cantonese, 'cmn' for Mandarin, 'nan' for Min Nan, etc.) If the SSML WG has an opinion about this, it would be extremely valuable to the I18N WG and those of us engaged in work on language identification. I'd be happy to provide (in a separate thread) suitable background, etc.

Best Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of Dan Burnett
> Sent: Wednesday, May 07, 2008 12:15 PM
> To: Richard Ishida
> Cc: jim@larson-tech.com; ashimura@w3.org; scott.mcglashan@hp.com;
> public-i18n-core@w3.org
> Subject: Re: [SSML11] i18n comment 4: zh-CN-HK
>
>
> s/an accent that is different/an accent that is different from the
> expected/common accent for the voice's language/
>
> Also, we would love to receive a new/better example from Addison that
> meets this criterion.
>
> -- dan
>
> On May 7, 2008, at 3:13 PM, Richard Ishida wrote:
>
> > My notes from the FTF in Beijing:
> >
> > Happy to change the tag, but want to keep the idea of an accent
> > that is
> > different.
> > Question about whether to use yue or zh-yue.
> >
> > RI
> >
> > ============
> > Richard Ishida
> > Internationalization Lead
> > W3C (World Wide Web Consortium)
> >
> > http://www.w3.org/International/

> > http://rishida.net/blog/

> > http://rishida.net/

> >
> >
> >
> >> -----Original Message-----
> >> From: public-i18n-core-request@w3.org
> > [mailto:public-i18n-core-request@w3.org]
> >> On Behalf Of ishida@w3.org
> >> Sent: 07 April 2008 16:22
> >> To: dburnett@voxeo.com; jim@larson-tech.com; ashimura@w3.org;
> >> scott.mcglashan@hp.com; public-i18n-core@w3.org
> >> Subject: [SSML11] i18n comment 4: zh-CN-HK
> >>
> >>
> >> Comment from the i18n review of:
> >> http://www.w3.org/TR/2008/WD-speech-synthesis11-20080317/

> >>
> >> Comment 4
> >> At http://www.w3.org/International/reviews/0804-ssml11/Overview.html

> >> Editorial/substantive: E
> >> Tracked by: AP
> >>
> >> Location in reviewed document:
> >> 3.2.1 [http://www.w3.org/TR/2008/WD-speech-synthesis11-20080317/

> >> #S3.2.1]
> >>
> >> Comment:
> >> zh-CN-HK is an illegal language tag (in one of the examples). It
> >> might be
> > better to
> >> avoid a chinese example, at least initially ... if you want
> >> control over
> > which
> >> *langauge* is used, you should use cmn or yue tags rather than zh-
> >> CN etc.
> >>
> >>
> >> Addison Phillips has taken an action to propose an alternative
> >> paragraph
> > or two for
> >> the example.
> >>
> >>
> >
> >
>

Received on Wednesday, 7 May 2008 19:39:37 UTC