RE: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published

http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html 
Lots of useful i18n-related changes to this doc. Thanks. Here are some
comments. I hope they help. I included some nit-like editorial points with
the more substantive ones.


===============
Status section
"This document enhances SSML 1.0 [SSML] to provide better support for a
broader set of languages."

Presumably that is natural languages rather than markup languages?

===============
1.5 URI
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html#S1.5

I think it would be better to define URI directly in terms of RFC 3987 or
its successor than referring to the XML Schema definition.  

I suggest that you adopt a definition like that of XQuery. The XQuery
definition reads:

"Within this specification, the term URI refers to a Universal Resource
Identifier as defined in [RFC3986] and extended in [RFC3987] with the new
name IRI. The term URI has been retained in preference to IRI to avoid
introducing new names for concepts such as "Base URI" that are defined or
referenced across the whole family of XML specifications."


============
3.1.2 xml:lang attribute
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html#S3.1.2

I suggest: s/to indicate the natural language of the content of the
element/to indicate the natural language of the written content of the
element/

I'm thinking it would be useful to say, specifically, that values must
conform to BCP 47.  Rather than the, to me, slightly weak sounding "BCP 47
can help in understanding how to use this attribute".


================
3.1.8.2 w element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html#S3.1.8.2

We recently sent a comment to the XQuery and XPath Full Text folks
recommending that they drop the word 'word' in favour of 'token', since
'word' is such a complicated thing to define in many languages.  I think the
same probably applies here, eg. "to eliminate word segmentation ambiguities"
should at least be word/token.

The i18n WG will probably suggest also replacing the w element with a t
element.

I suggest: s/that do not use white-space as a boundary identifier/that do
not use white-space as a token boundary identifier/

Note also that Thai does use space as a boundary identifier, but those
boundaries are phrasal rather than token level.


Spec says: [[Thus, "<w><emphasis>hap</emphasis>py</w>" and "<w><emphasis>
hap </emphasis> py</w>" would refer to the words "happy" and " hap py",
respectively.]]

I think the second example would be written more correctly as
<w><emphasis>hap</emphasis> py</w>, with an initial space before the <w>.
I'm not sure why the whitespace rules need to be different for <w>.  Note,
also, that including space before closing markup in some circumstances can
cause problems for bidi text (see
http://www.w3.org/International/questions/qa-bidi-space).



Suggestion: s/xml:lang is a defined attribute on the w element to identify
the language of the content./xml:lang is a defined attribute on the w
element to identify the written language of the content./


Chinese is a little unusual wrt language tags.

The first example on purple background includes xml:lang="zh-CN" - I think
that if the examples were of Mandarin (Putonghua) Chinese that should be
either zh-cmn or zh-Hans, or zh-cmn-Hans. (see
http://people.w3.org/rishida/utils/subtags/index.php?searchtext=mandarin&sub
mit=Search&searchtype=2 )

If you are describing the spoken language, I would go for zh-cmn, but I
think xml:lang is used to describe the written content, for which zh-Hans is
usually more appropriate. If the implementation will derive from xml:lang
information about which language to set the voice in, then it would probably
be necessary to say that this is, say, Putonghua (Mandarin), in which case
you'd probably want to use zh-cmn-Hans.

Of course the examples that follow seem to indicate that this would actually
need to be Shanghaiese, for which the subtag is zh-wuu.  Unfortunately,
there is no provision at the moment for zh-wuu-Hans, although that is coming
in the next version of BCP 47.


=============
3.2.1 voice element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html#S3.2.1

"where both language and accent can be values like you would find in
xml:lang"
I think you should specify that values MUST be composed using BCP 47 -
otherwise you leave the way open to interoperability problems.

"optional attribute indicating the list of languages the voice can speak,
with optional accent indication per language, or the empty string " 
After reading this through several times, I concluded that the empty string
is an alternative to the accent indication (rather than allowing
langauges="") - ie. that the language attribute has to contain something,
but it could just be language tag(s).  Is that correct?  


If we have <voice languages="fr:zh"> and there is no voice that supports
French with a Chinese accent, then presumably a voice that supports French
will be a suitable fallback?  If so, you should probably say that in the
onvoicefailure section.


The example on purple background says <voice gender="female"
languages="en-US" ... rather than <voice gender="female"
languages="en:en-US" ...

Is this a mistake, or does it mean that accent should be specified with a
single language tag where possible, and that the colon separator is only
needed for accents that are not expressible in that way, eg. en:zh?


In the required attribute "The default value for this attribute is
"languages"."  But if no languages attribute is defined, what is the default
language?  Is this the language specified by the xml:lang attribute?  

I think it may be worth repeating in this section that the voice setting for
language can be taken from the xml:lang information. I think it would also
be useful to have a paragraph and example describing and illustrating the
effects of the xml:lang and voice languages settings respectively, and how
they cross over.

It may be necessary to clarify what happens if only a fr voice is available
but xml:lang says fr-CA and there is no <voice languages="fr"...



===============
3.1.12 lang Element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-200706
11diff.html#S3.1.12

I'd vote for <span> as the name. Apart from anything else, that would allow
for other uses that may arise in the future, not related to language. You
never know...



============
Other

It may be worthwhile specifying expected behaviour when content is
non-linguistic or undetermined.  See
http://www.w3.org/International/questions/qa-no-language


RI


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 

> -----Original Message-----
> From: Daniel C. Burnett [mailto:Daniel.Burnett@nuance.com] 
> Sent: 02 July 2007 15:08
> To: Richard Ishida
> Cc: shuangzw@cn.ibm.com; Kazuyuki Ashimura
> Subject: RE: [ssml11] Second WD of SSML 1.1 and updated 
> Requirements doc are published
> 
> Richard,
> 
> Have you had a chance to look at the specification yet?  Our 
> subgroup meeting in China begins on Wednesday, 4 July (in two 
> days), and I would appreciate any early feedback you have 
> that we might be able to discuss.
> 
> Thanks,
> 
> Dan

Received on Tuesday, 3 July 2007 10:15:41 UTC