I18N comments on the "Speech Synthesis Markup Language Specification for the Speech Interface Framework"

Hello,

I'm writing on behalf of the W3C Internationalization Working Group
(I18N WG).

The I18N WG recently held a face-to-face meeting, and reviewed
the "Speech Synthesis Markup Language Specification for the Speech
Interface Framework", published on 08 August 2000.

    http://www.w3.org/TR/2000/WD-speech-synthesis-20000808

The following is a list of comments related to i18n.  Other non-i18n
related comments will be sent separately.


==========

  Problem with the document itself

   The specification is written in XHTML 1.0 Transitional, served as
   "text/html; charset=iso-8859-1", and includes the following lines:

     <meta http-equiv="Content-Type"
     content="text/html; charset=iso-8859-1" />

   but it failed to add

     <?xml version="1.0" encoding="ISO-8859-1"?>

   at the beginning of the document.

  2.2 "xml:lang" Attribute: Language

   The "xml:lang" attribute is not defined at all in any of the elements
   in the DTD! This is very serious problem, and must be fixed.

   The spec says:

     Following the XML convention, languages are indicated by an
     "xml:lang"attribute on the enclosing element with the value
     following RFC 1766 to define language codes.

   The spec should not just mention RFC 1766, rather, should state as the
   XML spec says. Note that the XML 1.0 spec has been modified in this
   respect, please refer to E73 of the XML 1.0 Specification Errata [1],
   and also "2.12 Language Identification" [2] of the XML 1.0 Second
   Edition.

      [1] http://www.w3.org/XML/xml-19980210-errata#E73
      [2] http://www.w3.org/TR/2000/WD-xml-2e-20000814#sec-lang-tag

   The spec also says:

     Language information is inherited down the document hierarchy, i.e.
     it has to be given only once if the whole document is in one
     language, and language information nests, i.e. inner attributes
     overwrite outer attributes.

   But "it has to be given only once" is a bit too strict restriction.
   According to this definition, the following example would be invalid:

     <speak xml:lang="en-US">
       ... English words ...
     <sayas xml:lang="en-US" sub="World Wide Web Consortium">W3C</sayas>
       ... English words ...
     </speak>

   But we don't think this is harmful. Actually the spec says in "Usage
   note 3", that:

     Where the "xml:lang" value is the same as the inherited value there
     is no need for any changes in the voice or prosody.

   This is true, so we don't think it's necessary to prohibit more than
   one occurrences of the same value, even if the whole document is in
   one language.

   Of course, in general there's no need to duplicate the same value. But
   for example, if someone has an XSLT stylesheet to transform every
   occurrence of "W3C" to "<sayas xml:lang="en-US" sub="World Wide Web
   Consortium">W3C</sayas>" regardless of the primary language of the
   document, it would be much easier to just retain the "xml:lang"
   attribute on that element rather than checking whether the whole
   document is in "en-US" and if so having to remove the "xml:lang"
   attribute on that element.

   What's the rationale of this restriction?

  2.4 "sayas" Element

   The spec defines the pronunciation type "sub" as:

     * sub: contained text is substituted for pronunciation with the
       specified text. This allows a document to contain both a spoken
       and written form.

   This is quite similar to the purpose of ruby, so it might be
   interesting to study the interoperability with the Ruby Annotation
   spec [3].

      [3] http://www.w3.org/TR/ruby

   Also, currently the spec defines "sub" as an attribute, but sometimes
   it might be desirable to add markup to the substituted text. For
   example, someone might want to mark-up

     <sayas sub="UCS Transformation Format">UTF</sayas>

   and still want to specify that "UCS" is an acronym, but current syntax
   doesn't allow this kind of markup. Or, if the substituted text is
   multilingual, you can't specify the change of language within
   attribute value. Note that for this kind of consideration, ruby markup
   uses element for ruby annotation, though earlier proposal used
   attribute.

  2.5 "phoneme" Element

   In all examples, "x" is missing in hexadecimal numeric character
   references. For example, LATIN SMALL LETTER TURNED ALPHA (U+0252) must
   be referenced as "&#x252;", not "&#252;". "&#252;" is LATIN SMALL
   LETTER U WITH DIAERESIS (U+00FC), which is definitely a different
   character. An example notes that

     <!-- This example uses the Unicode IPA characters. -->
     <!-- Note: this will not display correctly on most browsers -->

   but actually such a wrong example will not be displayed correctly on
   ALL browsers.

  2.6 "voice" Element

   In examples at "Usage note 4", the spec uses unregistered language
   codes like "en-cockney" and "en-brooklyn". It would be better to use
   registered one (e.g. "en-scouse") in examples. IANA Registry of
   Language Tags can be found at:
   http://www.isi.edu/in-notes/iana/assignments/languages/

  2.9 "prosody" Element

   The rate attribute specifies the speaking rate in "words per minute",
   but the notion of "word" may differ across languages. Relative values
   lile "fast", "medium", "slow", "default" would be OK, but another
   relative values like "+10" and "-5.5" might need careful
   consideration.

  5. DTD for the Speech Synthesis Markup Language

   Other examples use the XML declaration like:

     <?xml version="1.0"?>

   but the DTD uses the following XML declaration:

     <?xml version="1.0" encoding="ISO-8859-1"?>

   but this DTD only uses Basic Latin characters, and we don't see why
   this DTD has to be encoded in ISO-8859-1 or why it has to be different
   from UTF-8 or UTF-16.

   Also, it would be better to make this DTD available as
   machine-readable form, rather than just including it in the middle of
   the spec.

==========

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Sunday, 10 September 2000 23:47:49 UTC