- From: Richard Ishida <ishida@w3.org>
- Date: Tue, 3 Jul 2007 11:17:16 +0100
- To: "'Daniel C. Burnett'" <Daniel.Burnett@nuance.com>
- Cc: <shuangzw@cn.ibm.com>, "'Kazuyuki Ashimura'" <ashimura@w3.org>, <public-i18n-core@w3.org>
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html Lots of useful i18n-related changes to this doc. Thanks. Here are some comments. I hope they help. I included some nit-like editorial points with the more substantive ones. =============== Status section "This document enhances SSML 1.0 [SSML] to provide better support for a broader set of languages." Presumably that is natural languages rather than markup languages? =============== 1.5 URI http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html#S1.5 I think it would be better to define URI directly in terms of RFC 3987 or its successor than referring to the XML Schema definition. I suggest that you adopt a definition like that of XQuery. The XQuery definition reads: "Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC3986] and extended in [RFC3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as "Base URI" that are defined or referenced across the whole family of XML specifications." ============ 3.1.2 xml:lang attribute http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html#S3.1.2 I suggest: s/to indicate the natural language of the content of the element/to indicate the natural language of the written content of the element/ I'm thinking it would be useful to say, specifically, that values must conform to BCP 47. Rather than the, to me, slightly weak sounding "BCP 47 can help in understanding how to use this attribute". ================ 3.1.8.2 w element http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html#S3.1.8.2 We recently sent a comment to the XQuery and XPath Full Text folks recommending that they drop the word 'word' in favour of 'token', since 'word' is such a complicated thing to define in many languages. I think the same probably applies here, eg. "to eliminate word segmentation ambiguities" should at least be word/token. The i18n WG will probably suggest also replacing the w element with a t element. I suggest: s/that do not use white-space as a boundary identifier/that do not use white-space as a token boundary identifier/ Note also that Thai does use space as a boundary identifier, but those boundaries are phrasal rather than token level. Spec says: [[Thus, "<w><emphasis>hap</emphasis>py</w>" and "<w><emphasis> hap </emphasis> py</w>" would refer to the words "happy" and " hap py", respectively.]] I think the second example would be written more correctly as <w><emphasis>hap</emphasis> py</w>, with an initial space before the <w>. I'm not sure why the whitespace rules need to be different for <w>. Note, also, that including space before closing markup in some circumstances can cause problems for bidi text (see http://www.w3.org/International/questions/qa-bidi-space). Suggestion: s/xml:lang is a defined attribute on the w element to identify the language of the content./xml:lang is a defined attribute on the w element to identify the written language of the content./ Chinese is a little unusual wrt language tags. The first example on purple background includes xml:lang="zh-CN" - I think that if the examples were of Mandarin (Putonghua) Chinese that should be either zh-cmn or zh-Hans, or zh-cmn-Hans. (see http://people.w3.org/rishida/utils/subtags/index.php?searchtext=mandarin&sub mit=Search&searchtype=2 ) If you are describing the spoken language, I would go for zh-cmn, but I think xml:lang is used to describe the written content, for which zh-Hans is usually more appropriate. If the implementation will derive from xml:lang information about which language to set the voice in, then it would probably be necessary to say that this is, say, Putonghua (Mandarin), in which case you'd probably want to use zh-cmn-Hans. Of course the examples that follow seem to indicate that this would actually need to be Shanghaiese, for which the subtag is zh-wuu. Unfortunately, there is no provision at the moment for zh-wuu-Hans, although that is coming in the next version of BCP 47. ============= 3.2.1 voice element http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html#S3.2.1 "where both language and accent can be values like you would find in xml:lang" I think you should specify that values MUST be composed using BCP 47 - otherwise you leave the way open to interoperability problems. "optional attribute indicating the list of languages the voice can speak, with optional accent indication per language, or the empty string " After reading this through several times, I concluded that the empty string is an alternative to the accent indication (rather than allowing langauges="") - ie. that the language attribute has to contain something, but it could just be language tag(s). Is that correct? If we have <voice languages="fr:zh"> and there is no voice that supports French with a Chinese accent, then presumably a voice that supports French will be a suitable fallback? If so, you should probably say that in the onvoicefailure section. The example on purple background says <voice gender="female" languages="en-US" ... rather than <voice gender="female" languages="en:en-US" ... Is this a mistake, or does it mean that accent should be specified with a single language tag where possible, and that the colon separator is only needed for accents that are not expressible in that way, eg. en:zh? In the required attribute "The default value for this attribute is "languages"." But if no languages attribute is defined, what is the default language? Is this the language specified by the xml:lang attribute? I think it may be worth repeating in this section that the voice setting for language can be taken from the xml:lang information. I think it would also be useful to have a paragraph and example describing and illustrating the effects of the xml:lang and voice languages settings respectively, and how they cross over. It may be necessary to clarify what happens if only a fr voice is available but xml:lang says fr-CA and there is no <voice languages="fr"... =============== 3.1.12 lang Element http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706 11diff.html#S3.1.12 I'd vote for <span> as the name. Apart from anything else, that would allow for other uses that may arise in the future, not related to language. You never know... ============ Other It may be worthwhile specifying expected behaviour when content is non-linguistic or undetermined. See http://www.w3.org/International/questions/qa-no-language RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://people.w3.org/rishida/blog/ http://www.flickr.com/photos/ishida/ > -----Original Message----- > From: Daniel C. Burnett [mailto:Daniel.Burnett@nuance.com] > Sent: 02 July 2007 15:08 > To: Richard Ishida > Cc: shuangzw@cn.ibm.com; Kazuyuki Ashimura > Subject: RE: [ssml11] Second WD of SSML 1.1 and updated > Requirements doc are published > > Richard, > > Have you had a chance to look at the specification yet? Our > subgroup meeting in China begins on Wednesday, 4 July (in two > days), and I would appreciate any early feedback you have > that we might be able to discuss. > > Thanks, > > Dan
Received on Tuesday, 3 July 2007 10:15:41 UTC