RE: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published

Thanks Addison.  I appreciate the quick comments -- we'll look over them
in our meeting this week.

-- dan

-----Original Message-----
From: Addison Phillips [mailto:addison@yahoo-inc.com] 
Sent: Tuesday, July 03, 2007 12:50 PM
To: Daniel C. Burnett
Cc: Richard Ishida; shuangzw@cn.ibm.com; Kazuyuki Ashimura;
public-i18n-core@w3.org
Subject: Re: [ssml11] Second WD of SSML 1.1 and updated Requirements doc
are published

A few comments on Richard's. Note that these are personal comments and 
not I18N Core WG comments.

> 
> ============
> 3.1.2 xml:lang attribute
>
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
> 0706
> 11diff.html#S3.1.2
> 
> I suggest: s/to indicate the natural language of the content of the
> element/to indicate the natural language of the written content of the
> element/

Language identifiers are not limited to written content (although these 
elements will contain written content, no?)

> 
> I'm thinking it would be useful to say, specifically, that values must
> conform to BCP 47.  Rather than the, to me, slightly weak sounding
"BCP
> 47
> can help in understanding how to use this attribute".

+1

> 
> 
> ================
> 3.1.8.2 w element
>
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
> 0706
> 11diff.html#S3.1.8.2
> 
...
> 
> I suggest: s/that do not use white-space as a boundary identifier/that
> do
> not use white-space as a token boundary identifier/
> 
> Note also that Thai does use space as a boundary identifier, but those
> boundaries are phrasal rather than token level.

That is, "words" (tokens) are not necessarily separated by spaces.

> 
> 
> Chinese is a little unusual wrt language tags.
> 
...
> 
> Of course the examples that follow seem to indicate that this would
> actually
> need to be Shanghaiese, for which the subtag is zh-wuu.
Unfortunately,
> there is no provision at the moment for zh-wuu-Hans, although that is
> coming
> in the next version of BCP 47.

Due Real Soon Now. If you need a non-Mandarin example, Cantonese (which 
is the dialect spoken in e.g. Hong Kong) would probably be a better 
choice (the subtag for Cantonese is 'yue', i.e. "zh-yue-Hant", etc.).

Almost certainly you will want to distinguish written and spoken forms. 
The written forms for the various Chinese languages/dialects are 
(nearly) indistinguishable. The variation is between the Traditional and

Simplified scripts (Hant vs. Hans script subtags).

When rendering written Chinese into a spoken form, however, you need to 
know which dialect is being used (it makes a major difference!!). Hence 
the need for additional subtags.

A word of caution. While there are some grandfathered tags such as 
"zh-cmn-Hans" currently extant, there is also some debate about whether 
this will ultimately be the form used for the Chinese dialects. It is 
possible that some or all of the Chinese dialects will end up being 
represented by their (naked) language codes. Thus you might see 
"cmn-Hans", "yue-Hant", and "wuu-Hans" as valid tags. (This is an open 
issue and currently opinion is running the other way, towards preserving

the "zh-" as a prefix to each of these)

I guess what I'm suggesting is that be cautious with your Chinese 
examples (give them as examples using extant grandfathered tags, to be 
sure, but avoid trying to give normative guidance for now).
> 
> If we have <voice languages="fr:zh"> and there is no voice that
supports
> French with a Chinese accent, then presumably a voice that supports
> French
> will be a suitable fallback?  If so, you should probably say that in
the
> onvoicefailure section.

I would add: you should probably specify the matching algorithm used. 
See RFC 4647 (part of BCP 47). For this type of matching, the Lookup 
algorithm is often a good choice to specify. The current text is too 
vague, hence the remainder of Richard's comment (mostly omitted here).

> 
> 
> The example on purple background says <voice gender="female"
> languages="en-US" ... rather than <voice gender="female"
> languages="en:en-US" ...
> 
> Is this a mistake, or does it mean that accent should be specified
with
> a
> single language tag where possible, and that the colon separator is
only
> needed for accents that are not expressible in that way, eg. en:zh?

... or does this mean that the "languages" attribute is a "language 
priority list" (see RFC 4647)??

Best Regards,

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG
C0-Editor -- IETF BCP 47 [RFC 4646, RFC 4647]

Internationalization is an architecture.
It is not a feature.

Received on Tuesday, 3 July 2007 16:53:43 UTC