RE: [ssml11] Second WD of SSML 1.1 and updated Requirements doc are published

Thanks Richard!

-----Original Message-----
From: Richard Ishida [mailto:ishida@w3.org] 
Sent: Tuesday, July 03, 2007 6:17 AM
To: Daniel C. Burnett
Cc: shuangzw@cn.ibm.com; 'Kazuyuki Ashimura'; public-i18n-core@w3.org
Subject: RE: [ssml11] Second WD of SSML 1.1 and updated Requirements doc
are published

http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html 
Lots of useful i18n-related changes to this doc. Thanks. Here are some
comments. I hope they help. I included some nit-like editorial points
with
the more substantive ones.


===============
Status section
"This document enhances SSML 1.0 [SSML] to provide better support for a
broader set of languages."

Presumably that is natural languages rather than markup languages?

===============
1.5 URI
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html#S1.5

I think it would be better to define URI directly in terms of RFC 3987
or
its successor than referring to the XML Schema definition.  

I suggest that you adopt a definition like that of XQuery. The XQuery
definition reads:

"Within this specification, the term URI refers to a Universal Resource
Identifier as defined in [RFC3986] and extended in [RFC3987] with the
new
name IRI. The term URI has been retained in preference to IRI to avoid
introducing new names for concepts such as "Base URI" that are defined
or
referenced across the whole family of XML specifications."


============
3.1.2 xml:lang attribute
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html#S3.1.2

I suggest: s/to indicate the natural language of the content of the
element/to indicate the natural language of the written content of the
element/

I'm thinking it would be useful to say, specifically, that values must
conform to BCP 47.  Rather than the, to me, slightly weak sounding "BCP
47
can help in understanding how to use this attribute".


================
3.1.8.2 w element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html#S3.1.8.2

We recently sent a comment to the XQuery and XPath Full Text folks
recommending that they drop the word 'word' in favour of 'token', since
'word' is such a complicated thing to define in many languages.  I think
the
same probably applies here, eg. "to eliminate word segmentation
ambiguities"
should at least be word/token.

The i18n WG will probably suggest also replacing the w element with a t
element.

I suggest: s/that do not use white-space as a boundary identifier/that
do
not use white-space as a token boundary identifier/

Note also that Thai does use space as a boundary identifier, but those
boundaries are phrasal rather than token level.


Spec says: [[Thus, "<w><emphasis>hap</emphasis>py</w>" and
"<w><emphasis>
hap </emphasis> py</w>" would refer to the words "happy" and " hap py",
respectively.]]

I think the second example would be written more correctly as
<w><emphasis>hap</emphasis> py</w>, with an initial space before the
<w>.
I'm not sure why the whitespace rules need to be different for <w>.
Note,
also, that including space before closing markup in some circumstances
can
cause problems for bidi text (see
http://www.w3.org/International/questions/qa-bidi-space).



Suggestion: s/xml:lang is a defined attribute on the w element to
identify
the language of the content./xml:lang is a defined attribute on the w
element to identify the written language of the content./


Chinese is a little unusual wrt language tags.

The first example on purple background includes xml:lang="zh-CN" - I
think
that if the examples were of Mandarin (Putonghua) Chinese that should be
either zh-cmn or zh-Hans, or zh-cmn-Hans. (see
http://people.w3.org/rishida/utils/subtags/index.php?searchtext=mandarin
&sub
mit=Search&searchtype=2 )

If you are describing the spoken language, I would go for zh-cmn, but I
think xml:lang is used to describe the written content, for which
zh-Hans is
usually more appropriate. If the implementation will derive from
xml:lang
information about which language to set the voice in, then it would
probably
be necessary to say that this is, say, Putonghua (Mandarin), in which
case
you'd probably want to use zh-cmn-Hans.

Of course the examples that follow seem to indicate that this would
actually
need to be Shanghaiese, for which the subtag is zh-wuu.  Unfortunately,
there is no provision at the moment for zh-wuu-Hans, although that is
coming
in the next version of BCP 47.


=============
3.2.1 voice element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html#S3.2.1

"where both language and accent can be values like you would find in
xml:lang"
I think you should specify that values MUST be composed using BCP 47 -
otherwise you leave the way open to interoperability problems.

"optional attribute indicating the list of languages the voice can
speak,
with optional accent indication per language, or the empty string " 
After reading this through several times, I concluded that the empty
string
is an alternative to the accent indication (rather than allowing
langauges="") - ie. that the language attribute has to contain
something,
but it could just be language tag(s).  Is that correct?  


If we have <voice languages="fr:zh"> and there is no voice that supports
French with a Chinese accent, then presumably a voice that supports
French
will be a suitable fallback?  If so, you should probably say that in the
onvoicefailure section.


The example on purple background says <voice gender="female"
languages="en-US" ... rather than <voice gender="female"
languages="en:en-US" ...

Is this a mistake, or does it mean that accent should be specified with
a
single language tag where possible, and that the colon separator is only
needed for accents that are not expressible in that way, eg. en:zh?


In the required attribute "The default value for this attribute is
"languages"."  But if no languages attribute is defined, what is the
default
language?  Is this the language specified by the xml:lang attribute?  

I think it may be worth repeating in this section that the voice setting
for
language can be taken from the xml:lang information. I think it would
also
be useful to have a paragraph and example describing and illustrating
the
effects of the xml:lang and voice languages settings respectively, and
how
they cross over.

It may be necessary to clarify what happens if only a fr voice is
available
but xml:lang says fr-CA and there is no <voice languages="fr"...



===============
3.1.12 lang Element
http://www.w3.org/Voice/2007/speech-synthesis11/WD-speech-synthesis11-20
0706
11diff.html#S3.1.12

I'd vote for <span> as the name. Apart from anything else, that would
allow
for other uses that may arise in the future, not related to language.
You
never know...



============
Other

It may be worthwhile specifying expected behaviour when content is
non-linguistic or undetermined.  See
http://www.w3.org/International/questions/qa-no-language


RI


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 

> -----Original Message-----
> From: Daniel C. Burnett [mailto:Daniel.Burnett@nuance.com] 
> Sent: 02 July 2007 15:08
> To: Richard Ishida
> Cc: shuangzw@cn.ibm.com; Kazuyuki Ashimura
> Subject: RE: [ssml11] Second WD of SSML 1.1 and updated 
> Requirements doc are published
> 
> Richard,
> 
> Have you had a chance to look at the specification yet?  Our 
> subgroup meeting in China begins on Wednesday, 4 July (in two 
> days), and I would appreciate any early feedback you have 
> that we might be able to discuss.
> 
> Thanks,
> 
> Dan

Received on Tuesday, 3 July 2007 16:23:47 UTC