RE: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published from Daniel C. Burnett on 2007-08-14 (public-i18n-core@w3.org from July to September 2007)

From: Daniel C. Burnett <Daniel.Burnett@nuance.com>
Date: Tue, 14 Aug 2007 07:15:21 -0400
To: "Addison Phillips" <addison@yahoo-inc.com>
Cc: "Richard Ishida" <ishida@w3.org>, <shuangzw@cn.ibm.com>, "Kazuyuki Ashimura" <ashimura@w3.org>, <public-i18n-core@w3.org>, "w3c-voice-wg" <w3c-voice-wg@w3.org>
Message-ID: <2AB5541EB33172459EE430FFB66B1EE908731E04@BN-EXCH01.nuance.com>
Addison,

Thanks for your comments.  We particularly appreciated your suggestion
to use RFC4647 matching algorithms where appropriate, as you will see
below.

My replies to your individual points are embedded below, preceded by
[DB], and represent the approximate informal opinion of the subgroup at
this time.

-- dan

-----Original Message-----
From: Addison Phillips [mailto:addison@yahoo-inc.com] 
Sent: Tuesday, July 03, 2007 12:50 PM
To: Daniel C. Burnett
Cc: Richard Ishida; shuangzw@cn.ibm.com; Kazuyuki Ashimura;
public-i18n-core@w3.org
Subject: Re: [ssml11] Second WD of SSML 1.1 and updated Requirements doc
are published

A few comments on Richard's. Note that these are personal comments and 
not I18N Core WG comments.

> 
> ============
> 3.1.2 xml:lang attribute
>
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
> 0706
> 11diff.html#S3.1.2
> 
> I suggest: s/to indicate the natural language of the content of the
> element/to indicate the natural language of the written content of the
> element/

Language identifiers are not limited to written content (although these 
elements will contain written content, no?)

[DB] Correct and correct.

> 
> I'm thinking it would be useful to say, specifically, that values must
> conform to BCP 47.  Rather than the, to me, slightly weak sounding
"BCP
> 47
> can help in understanding how to use this attribute".

+1

[DB] Hmmm.  See my response to Richard :)

> 
> 
> ================
> 3.1.8.2 w element
>
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
> 0706
> 11diff.html#S3.1.8.2
> 
...
> 
> I suggest: s/that do not use white-space as a boundary identifier/that
> do
> not use white-space as a token boundary identifier/
> 
> Note also that Thai does use space as a boundary identifier, but those
> boundaries are phrasal rather than token level.

That is, "words" (tokens) are not necessarily separated by spaces.

[DB] We definitely understand this point ourselves.  I consider this
equivalent to a typo since we meant to indicate what boundary we were
identifying -- we'll fix it.
>From this comment by you and similar ones from Richard, we realized that
we were not properly conveying that we understand very well that
tokenization
can vary substantially across languages.  We will continue working on
our specification text in this regard until it is clear.

> 
> 
> Chinese is a little unusual wrt language tags.
> 
...
> 
> Of course the examples that follow seem to indicate that this would
> actually
> need to be Shanghaiese, for which the subtag is zh-wuu.
Unfortunately,
> there is no provision at the moment for zh-wuu-Hans, although that is
> coming
> in the next version of BCP 47.

Due Real Soon Now. If you need a non-Mandarin example, Cantonese (which 
is the dialect spoken in e.g. Hong Kong) would probably be a better 
choice (the subtag for Cantonese is 'yue', i.e. "zh-yue-Hant", etc.).

Almost certainly you will want to distinguish written and spoken forms. 
The written forms for the various Chinese languages/dialects are 
(nearly) indistinguishable. The variation is between the Traditional and

Simplified scripts (Hant vs. Hans script subtags).

When rendering written Chinese into a spoken form, however, you need to 
know which dialect is being used (it makes a major difference!!). Hence 
the need for additional subtags.

[DB] We agree wholeheartedly with the above.  We have some new examples
coming in, and at any point we are open to suggestions for improving the
values of xml:lang used in our examples.

A word of caution. While there are some grandfathered tags such as 
"zh-cmn-Hans" currently extant, there is also some debate about whether 
this will ultimately be the form used for the Chinese dialects. It is 
possible that some or all of the Chinese dialects will end up being 
represented by their (naked) language codes. Thus you might see 
"cmn-Hans", "yue-Hant", and "wuu-Hans" as valid tags. (This is an open 
issue and currently opinion is running the other way, towards preserving

the "zh-" as a prefix to each of these)

I guess what I'm suggesting is that be cautious with your Chinese 
examples (give them as examples using extant grandfathered tags, to be 
sure, but avoid trying to give normative guidance for now).

[DB] I don't believe we give normative guidance on the use of these, and
we never intend to, for the reasons you give.  We don't consider it our
job to work out the proper tags and subtags, just to be able to use them
when they occur!  At this point, our examples contain values that
authors in China are using today; if we should use something better
we're happy to switch.

> 
> If we have <voice languages="fr:zh"> and there is no voice that
supports
> French with a Chinese accent, then presumably a voice that supports
> French
> will be a suitable fallback?  If so, you should probably say that in
the
> onvoicefailure section.

I would add: you should probably specify the matching algorithm used. 
See RFC 4647 (part of BCP 47). For this type of matching, the Lookup 
algorithm is often a good choice to specify. The current text is too 
vague, hence the remainder of Richard's comment (mostly omitted here).

[DB] This comment was incredibly helpful, forcing a much needed
discussion on matching.  See below.

> 
> 
> The example on purple background says <voice gender="female"
> languages="en-US" ... rather than <voice gender="female"
> languages="en:en-US" ...
> 
> Is this a mistake, or does it mean that accent should be specified
with
> a
> single language tag where possible, and that the colon separator is
only
> needed for accents that are not expressible in that way, eg. en:zh?

... or does this mean that the "languages" attribute is a "language 
priority list" (see RFC 4647)??

[DB] It is definitely not a language priority list.  We have come to the
following conclusion:  for each language and for each accent (in the
pairs of language:accent in the "languages" attribute), we want to allow
the value
to be an extended language range.  When selecting a voice, both the
language and accent will use the extended filter matching algorithm, and
a voice is only considered a match if all lang/accent pairs are matched.


Best Regards,

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG
C0-Editor -- IETF BCP 47 [RFC 4646, RFC 4647]

Internationalization is an architecture.
It is not a feature.
Received on Tuesday, 14 August 2007 11:15:51 UTC