Re: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published from Kazuyuki Ashimura on 2007-07-03 (public-i18n-core@w3.org from July to September 2007)

From: Kazuyuki Ashimura <ashimura@w3.org>
Date: Wed, 04 Jul 2007 01:15:35 +0900
To: Richard Ishida <ishida@w3.org>
CC: "'Daniel C. Burnett'" <Daniel.Burnett@nuance.com>, shuangzw@cn.ibm.com, public-i18n-core@w3.org
Message-ID: <468A7627.3060507@w3.org>
Thank you very much, Richard!

We will discuss your comments at the next SSML f2f meeting in China.

Kazuyuki

Richard Ishida wrote:
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html 
> Lots of useful i18n-related changes to this doc. Thanks. Here are some
> comments. I hope they help. I included some nit-like editorial points with
> the more substantive ones.
>
>
> ===============
> Status section
> "This document enhances SSML 1.0 [SSML] to provide better support for a
> broader set of languages."
>
> Presumably that is natural languages rather than markup languages?
>
> ===============
> 1.5 URI
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html#S1.5
>
> I think it would be better to define URI directly in terms of RFC 3987 or
> its successor than referring to the XML Schema definition.  
>
> I suggest that you adopt a definition like that of XQuery. The XQuery
> definition reads:
>
> "Within this specification, the term URI refers to a Universal Resource
> Identifier as defined in [RFC3986] and extended in [RFC3987] with the new
> name IRI. The term URI has been retained in preference to IRI to avoid
> introducing new names for concepts such as "Base URI" that are defined or
> referenced across the whole family of XML specifications."
>
>
> ============
> 3.1.2 xml:lang attribute
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html#S3.1.2
>
> I suggest: s/to indicate the natural language of the content of the
> element/to indicate the natural language of the written content of the
> element/
>
> I'm thinking it would be useful to say, specifically, that values must
> conform to BCP 47.  Rather than the, to me, slightly weak sounding "BCP 47
> can help in understanding how to use this attribute".
>
>
> ================
> 3.1.8.2 w element
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html#S3.1.8.2
>
> We recently sent a comment to the XQuery and XPath Full Text folks
> recommending that they drop the word 'word' in favour of 'token', since
> 'word' is such a complicated thing to define in many languages.  I think the
> same probably applies here, eg. "to eliminate word segmentation ambiguities"
> should at least be word/token.
>
> The i18n WG will probably suggest also replacing the w element with a t
> element.
>
> I suggest: s/that do not use white-space as a boundary identifier/that do
> not use white-space as a token boundary identifier/
>
> Note also that Thai does use space as a boundary identifier, but those
> boundaries are phrasal rather than token level.
>
>
> Spec says: [[Thus, "<w><emphasis>hap</emphasis>py</w>" and "<w><emphasis>
> hap </emphasis> py</w>" would refer to the words "happy" and " hap py",
> respectively.]]
>
> I think the second example would be written more correctly as
> <w><emphasis>hap</emphasis> py</w>, with an initial space before the <w>.
> I'm not sure why the whitespace rules need to be different for <w>.  Note,
> also, that including space before closing markup in some circumstances can
> cause problems for bidi text (see
> http://www.w3.org/International/questions/qa-bidi-space).
>
>
>
> Suggestion: s/xml:lang is a defined attribute on the w element to identify
> the language of the content./xml:lang is a defined attribute on the w
> element to identify the written language of the content./
>
>
> Chinese is a little unusual wrt language tags.
>
> The first example on purple background includes xml:lang="zh-CN" - I think
> that if the examples were of Mandarin (Putonghua) Chinese that should be
> either zh-cmn or zh-Hans, or zh-cmn-Hans. (see
> http://people.w3.org/rishida/utils/subtags/index.php?searchtext=mandarin&sub
> mit=Search&searchtype=2 )
>
> If you are describing the spoken language, I would go for zh-cmn, but I
> think xml:lang is used to describe the written content, for which zh-Hans is
> usually more appropriate. If the implementation will derive from xml:lang
> information about which language to set the voice in, then it would probably
> be necessary to say that this is, say, Putonghua (Mandarin), in which case
> you'd probably want to use zh-cmn-Hans.
>
> Of course the examples that follow seem to indicate that this would actually
> need to be Shanghaiese, for which the subtag is zh-wuu.  Unfortunately,
> there is no provision at the moment for zh-wuu-Hans, although that is coming
> in the next version of BCP 47.
>
>
> =============
> 3.2.1 voice element
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html#S3.2.1
>
> "where both language and accent can be values like you would find in
> xml:lang"
> I think you should specify that values MUST be composed using BCP 47 -
> otherwise you leave the way open to interoperability problems.
>
> "optional attribute indicating the list of languages the voice can speak,
> with optional accent indication per language, or the empty string " 
> After reading this through several times, I concluded that the empty string
> is an alternative to the accent indication (rather than allowing
> langauges="") - ie. that the language attribute has to contain something,
> but it could just be language tag(s).  Is that correct?  
>
>
> If we have <voice languages="fr:zh"> and there is no voice that supports
> French with a Chinese accent, then presumably a voice that supports French
> will be a suitable fallback?  If so, you should probably say that in the
> onvoicefailure section.
>
>
> The example on purple background says <voice gender="female"
> languages="en-US" ... rather than <voice gender="female"
> languages="en:en-US" ...
>
> Is this a mistake, or does it mean that accent should be specified with a
> single language tag where possible, and that the colon separator is only
> needed for accents that are not expressible in that way, eg. en:zh?
>
>
> In the required attribute "The default value for this attribute is
> "languages"."  But if no languages attribute is defined, what is the default
> language?  Is this the language specified by the xml:lang attribute?  
>
> I think it may be worth repeating in this section that the voice setting for
> language can be taken from the xml:lang information. I think it would also
> be useful to have a paragraph and example describing and illustrating the
> effects of the xml:lang and voice languages settings respectively, and how
> they cross over.
>
> It may be necessary to clarify what happens if only a fr voice is available
> but xml:lang says fr-CA and there is no <voice languages="fr"...
>
>
>
> ===============
> 3.1.12 lang Element
> http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
> 11diff.html#S3.1.12
>
> I'd vote for <span> as the name. Apart from anything else, that would allow
> for other uses that may arise in the future, not related to language. You
> never know...
>
>
>
> ============
> Other
>
> It may be worthwhile specifying expected behaviour when content is
> non-linguistic or undetermined.  See
> http://www.w3.org/International/questions/qa-no-language
>
>
> RI
>
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
>  
>
>> -----Original Message-----
>> From: Daniel C. Burnett [mailto:Daniel.Burnett@nuance.com] 
>> Sent: 02 July 2007 15:08
>> To: Richard Ishida
>> Cc: shuangzw@cn.ibm.com; Kazuyuki Ashimura
>> Subject: RE: [ssml11] Second WD of SSML 1.1 and updated 
>> Requirements doc are published
>>
>> Richard,
>>
>> Have you had a chance to look at the specification yet?  Our 
>> subgroup meeting in China begins on Wednesday, 4 July (in two 
>> days), and I would appreciate any early feedback you have 
>> that we might be able to discuss.
>>
>> Thanks,
>>
>> Dan
Received on Tuesday, 3 July 2007 16:15:31 UTC