Comments on the "Speech Synthesis Markup Language Specification for the Speech Interface Framework"

Masayasu Ishikawa <mimasa@w3.org> wrote:

> Other non-i18n related comments will be sent separately.

And here's a list of non-i18n related comments.  These are my personal
comments, not representing the I18N WG or any other group.

==========

  Abstract

   In the second paragraph, change "a XML markup language" to "an XML
   markup language".

  Table of Contents

   A link to section 1.2 points to section 1.1, and a link to section 1.3
   points to section 1.2.

  1. Introduction

   The spec says:

     The W3C Standard is known as the Speech Recognition Grammar
     Specification and is based upon the JSML specification, which is
     owned by Sun Microsystems, Inc., California, U.S.A.

   but the Speech Recognition Grammar Specification is a Working Draft
   and it is inappropriate to cite it as "W3C Standard", as clearly
   indicated in the "Status of this Document" section of the Speech
   Recognition Grammar Specification.

  1.1 Terminology and Design Concepts

   In the list of key design criteria, item 2 "Interoperability", change
   "Audio Cascading Style Sheets" to "Aural Cascading Style Sheets".

  1.3 Document Generation, Applications and Contexts

   In the list of important instances of architectures or designs, item 2
   "Interoperability with", change "Cascading Style Sheets, level 2 CSS2
   Specification" would be better to be written as "Cascading Style
   Sheets, level 2 (CSS2) Specification".

  2.2 "xml:lang" Attribute: Language

   In the first paragraph, need whitespace between '"xml:lang"' and
   "attribute".

   Example uses the para element, but it's not defined in the DTD found
   in section 5. It should be the paragraph element.

   In "Usage note 5", change "handledby" to "handled by".

  2.3 "paragraph" and "sentence": Text Structure Elements

   In the first paragraph, need whitespace between '"sentence"' and
   "element".

   The spec says:

     Usage note 1: For brevity, the markup also supports <p> and <s> as
     exact equivalents of <paragraph> and <sentence>. (Note: XML
     requires that the opening and closing elements be identical so <p>
     text </paragraph> is not legal.). Also note that <s> means
     "strike-out" in HTML 4.0 and earlier, and in XHTML-1.0-Transitional
     but not in XHTML-1.0-Strict.

   But neither <p> nor <s> element is defined in the DTD (even though
   they appear in the "%structure;" parameter entity). Also, <s> means
   "strike-through" in HTML 4.0/4.01 Transitional and Frameset, but no
   "official" earlier version of HTML (3,2, 2.0, ...) defined the s
   element. Both HTML+ [1] and HTML 3.0 [2] proposed the s element, but
   they were never standardized.

      [1] http://www.w3.org/MarkUp/HTMLPlus/htmlplus_16.html
      [2] http://www.w3.org/MarkUp/html3/emphasis.html

  2.4 "sayas" Element

   In the second paragraph, the spec says:

     The "type" attribute is a required attribute that indicates the
     contained text construct. The format is a text type optionally
     followed by a colon and a format. The base set of type values,
     divided according to broad functionality, is as follows:

   but in an example where the sub attribute is used, the type attribute
   is not used. Is it required even when the sub attribute is used?

   Also, the above attribute value format is not reflected in the DTD
   found in section 5. The following enumerated definition in the DTD:

     <!ENTITY % sayas-types
         "(acronym|number|ordinal|digits|telephone|date|time|
           duration|currency|measure|name|net|address)">

   doesn't allow formats like "number:ordinal", while it allows formats
   like "ordinal", which seems to be an error according to the prose
   text. You would have to list all the possible combinations.

    Pronunciation Types

   In the DTD, the "sub" attribute is not defined.

    Time, Date and Measure Types

   A lot of format values like "dmy" and "mdy" appear, but there's no
   formal definition of each format value. People might guess what "dmy"
   means, but as a specification, those definitions need to be clear and
   precise. Relevant definitions in ISO 8601 [3] (Representation of dates
   and times) may be helpful.

      [3] http://www.iso.ch/markete/8601.pdf

    Time, Date and Measure Types

   In the example, the following line:

     Proposals are due in <sayas type="date:my"> 5/2001 <sayas/>

   should be:

     Proposals are due in <sayas type="date:my"> 5/2001 </sayas>

    Address, Name, Net Types

   Is "net:url" specifically for URL only? Or, does it allow other URIs
   (e.g. URN)?

   In "Usage note 1",

     <sayas type="date:ymd"> 2000/1/20 <sayas>

   should be

     <sayas type="date:ymd"> 2000/1/20 </sayas>

   In the first sentence of "Usage note 3":

     Usage note 3: The "sayas" element can be only be used ...

   Either of "be" is unnecessary.

  2.5 "phoneme" Element

   In the second sentence of the first paragraph, need whitespace between
   '"ph"' and "attribute".

  2.9 "prosody" Element

  Relative values

   The spec says:

     The relative changes for any of the attributes above can be "+10",
     "-5.5", "+15%", "-8%". ...

   It's not clear whether those are only permissible values, or those are
   just examples. In an example in this section, a value "-10%" is used,
   so maybe those are intended to be examples, but then the spec should
   clearly say so.

  2.10 "audio" Element

   Is it considered to use XLink [4] rather than the "src" attribute?

      [4] http://www.w3.org/TR/xlink

  2.12 Miscellaneous relevant XML features

   In "Usage note 1", the spec says:

     Usage note 1: When engines support non-standard elements and
     attributes it is good practice for the name to identify the feature
     as non-standard, for example, by using a "x" prefix or a company
     name prefix.

   It looks more natural to me to use XML namespaces [5] for this kind of
   extensions. Is it considered to use namespaces? And is the Speech
   Synthesis Markup Language going to have its own namespace?

      [5] http://www.w3.org/TR/REC-xml-names

  3.2 Other Phoeneme Alpahbets

   Change "Phoeneme Alpahbets" to "Phoneme Alphabets".

  3.3 Audio Element

   In the first sentence, need whitespace between '"audio"' and
   "element", and between '"mode"' and "attribute".

   Other sections have anchor on heading, but this section doesn't. It
   would be good to have an anchor like:

     <h3><a name="S3.3" id="S3.3">3.3 Audio Element</a></h3>

   Also, why only this section uses <strong>...</strong> within heading?
   It's not critical, but looks slightly strange.

  3.4 Mark Element

   In the first sentence, need whitespace between '"mark"' and "element".

   Same comment as "3.3 Audio Element" on anchor.

  3.5 Unspecified Requirements

   Same comment as "3.3 Audio Element" on anchor.

  3.6 Compliance

   An anchor like:

     <h3><a name="S3.3" id="S3.3">3.6 Compliance</a></h3>

   looks a bit strange.

  3.7 "lowlevel" Elements: Fine-Grained Acoustic-Prosodic Control

   Similar comment as "3.6 Compliance" on anchor.

    "ph" Element: Phoneme with Duration

   In the following example:

     <lowlevel alt="hello">
       <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/>
       <ph p="l" d=".0693"/><ph p="ou" d=".2181"/>
     </lowlevel>
     <!-- This example uses WorldBet phonemes -->

   "&" in an attribute value (p="&") must be escaped as "&amp;" or
   "&#x26;" or "&#38;", otherwise this example is not well-formed.

    "f0" Element: Timed Pitch Targets

   In the fourth sentence of the first paragraph, "The value attribute"
   would be better to be written as 'The "v" attribute' or 'The "v"
   (value) attribute'.

   In the following example:

     <lowlevel alt="hello" pitch="absolute">
       <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/>
       <ph p="l" d=".0693"/><ph p="ou" d=".2181"/>
       <!-- This example uses WorldBet phonemes -->
     
       <f0 v="103.5"/> <f0 v="112.5" t=".075"/>
       <f0 v="113.2" t=".175"/> <f0="128.1" t=".28"/>
     </lowlevel>

   Same comment as '"ph" Element: Phoneme with Duration' on "&".

   <f0="128.1" t=".28"/> should be <f0 v="128.1" t=".28"/> .

  3.8 Intonational Controls

   Similar comment as "3.6 Compliance" on anchor.

   In the first sentence of the last paragraph, change "emphasis
   elementcan" to "emphasis element can".

  3.9 "value" Element

   Similar comment as "3.6 Compliance" on anchor.

4. Examples

   In the second sentence of the first paragraph, change "elementsare" to
   "elements are".

   In the second example, the following URI is used:

     <paragraph><voice gender="male">
     Here's a sample.  <audio src="http://www.w3c.org/music.wav">
     Would you like to buy it?</voice></paragraph>

   Even in example, I'd suggest not to use the domain name "w3c.org". The
   "canonical" domain name for W3C is "w3.org", and using "w3c.org" just
   confuses people. For use as examples, I'd suggest to use reserved
   example domain names (e.g. example.com, example.net, example.org), as
   specified by RFC 2606 [6].

      [6] http://www.ietf.org/rfc/rfc2606.txt

5. DTD for the Speech Synthesis Markup Language

   As already pointed out, there are number of problems in this DTD and
   need serious rework. There are some basic syntax errors, e.g.:

     <!ENTITY % integer "CDATA" >
        ...
     <!ATTLIST voice
          gender   (male|female|neutral)                  #IMPLIED
          age      (%integer;|child|teenager|adult|elder) #IMPLIED
          variant  (%integer;|different)                  #IMPLIED
          name     (%voice-name;|default)                 #IMPLIED >

   Probably the intention was to allow integer values or those enumerated
   values on the age and the variant attributes, but this definition only
   states that "CDATA" (as literal string) is one of enumerated values -
   values like "20" are invalid. Unfortunately DTD doesn't have enough
   expressive power to express intended constraint.

==========

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Sunday, 10 September 2000 23:51:37 UTC