RE: Comments on the "Speech Synthesis Markup Language Specificati on for the Speech Interface Framework"

Masayasu -

Thank you very much for your exhaustive review of the speech synthesis
specification.  I will incorporate your notations on the document errors
into the change document as soon as possible.  I will also attempt to answer
your
questions in second mail thread.  

MRW

> -----Original Message-----
> From: Masayasu Ishikawa [mailto:mimasa@w3.org]
> Sent: Sunday, September 10, 2000 8:52 PM
> To: www-voice@w3.org
> Subject: Comments on the "Speech Synthesis Markup Language 
> Specification
> for the Speech Interface Framework"
> 
> 
> Masayasu Ishikawa <mimasa@w3.org> wrote:
> 
> > Other non-i18n related comments will be sent separately.
> 
> And here's a list of non-i18n related comments.  These are my personal
> comments, not representing the I18N WG or any other group.
> 
> ==========
> 
>   Abstract
> 
>    In the second paragraph, change "a XML markup language" to "an XML
>    markup language".
> 
>   Table of Contents
> 
>    A link to section 1.2 points to section 1.1, and a link to 
> section 1.3
>    points to section 1.2.
> 
>   1. Introduction
> 
>    The spec says:
> 
>      The W3C Standard is known as the Speech Recognition Grammar
>      Specification and is based upon the JSML specification, which is
>      owned by Sun Microsystems, Inc., California, U.S.A.
> 
>    but the Speech Recognition Grammar Specification is a Working Draft
>    and it is inappropriate to cite it as "W3C Standard", as clearly
>    indicated in the "Status of this Document" section of the Speech
>    Recognition Grammar Specification.
> 
>   1.1 Terminology and Design Concepts
> 
>    In the list of key design criteria, item 2 
> "Interoperability", change
>    "Audio Cascading Style Sheets" to "Aural Cascading Style Sheets".
> 
>   1.3 Document Generation, Applications and Contexts
> 
>    In the list of important instances of architectures or 
> designs, item 2
>    "Interoperability with", change "Cascading Style Sheets, 
> level 2 CSS2
>    Specification" would be better to be written as "Cascading Style
>    Sheets, level 2 (CSS2) Specification".
> 
>   2.2 "xml:lang" Attribute: Language
> 
>    In the first paragraph, need whitespace between '"xml:lang"' and
>    "attribute".
> 
>    Example uses the para element, but it's not defined in the 
> DTD found
>    in section 5. It should be the paragraph element.
> 
>    In "Usage note 5", change "handledby" to "handled by".
> 
>   2.3 "paragraph" and "sentence": Text Structure Elements
> 
>    In the first paragraph, need whitespace between '"sentence"' and
>    "element".
> 
>    The spec says:
> 
>      Usage note 1: For brevity, the markup also supports <p> 
> and <s> as
>      exact equivalents of <paragraph> and <sentence>. (Note: XML
>      requires that the opening and closing elements be 
> identical so <p>
>      text </paragraph> is not legal.). Also note that <s> means
>      "strike-out" in HTML 4.0 and earlier, and in 
> XHTML-1.0-Transitional
>      but not in XHTML-1.0-Strict.
> 
>    But neither <p> nor <s> element is defined in the DTD (even though
>    they appear in the "%structure;" parameter entity). Also, <s> means
>    "strike-through" in HTML 4.0/4.01 Transitional and Frameset, but no
>    "official" earlier version of HTML (3,2, 2.0, ...) defined the s
>    element. Both HTML+ [1] and HTML 3.0 [2] proposed the s 
> element, but
>    they were never standardized.
> 
>       [1] http://www.w3.org/MarkUp/HTMLPlus/htmlplus_16.html
>       [2] http://www.w3.org/MarkUp/html3/emphasis.html
> 
>   2.4 "sayas" Element
> 
>    In the second paragraph, the spec says:
> 
>      The "type" attribute is a required attribute that indicates the
>      contained text construct. The format is a text type optionally
>      followed by a colon and a format. The base set of type values,
>      divided according to broad functionality, is as follows:
> 
>    but in an example where the sub attribute is used, the 
> type attribute
>    is not used. Is it required even when the sub attribute is used?
> 
>    Also, the above attribute value format is not reflected in the DTD
>    found in section 5. The following enumerated definition in the DTD:
> 
>      <!ENTITY % sayas-types
>          "(acronym|number|ordinal|digits|telephone|date|time|
>            duration|currency|measure|name|net|address)">
> 
>    doesn't allow formats like "number:ordinal", while it 
> allows formats
>    like "ordinal", which seems to be an error according to the prose
>    text. You would have to list all the possible combinations.
> 
>     Pronunciation Types
> 
>    In the DTD, the "sub" attribute is not defined.
> 
>     Time, Date and Measure Types
> 
>    A lot of format values like "dmy" and "mdy" appear, but there's no
>    formal definition of each format value. People might guess 
> what "dmy"
>    means, but as a specification, those definitions need to 
> be clear and
>    precise. Relevant definitions in ISO 8601 [3] 
> (Representation of dates
>    and times) may be helpful.
> 
>       [3] http://www.iso.ch/markete/8601.pdf
> 
>     Time, Date and Measure Types
> 
>    In the example, the following line:
> 
>      Proposals are due in <sayas type="date:my"> 5/2001 <sayas/>
> 
>    should be:
> 
>      Proposals are due in <sayas type="date:my"> 5/2001 </sayas>
> 
>     Address, Name, Net Types
> 
>    Is "net:url" specifically for URL only? Or, does it allow 
> other URIs
>    (e.g. URN)?
> 
>    In "Usage note 1",
> 
>      <sayas type="date:ymd"> 2000/1/20 <sayas>
> 
>    should be
> 
>      <sayas type="date:ymd"> 2000/1/20 </sayas>
> 
>    In the first sentence of "Usage note 3":
> 
>      Usage note 3: The "sayas" element can be only be used ...
> 
>    Either of "be" is unnecessary.
> 
>   2.5 "phoneme" Element
> 
>    In the second sentence of the first paragraph, need 
> whitespace between
>    '"ph"' and "attribute".
> 
>   2.9 "prosody" Element
> 
>   Relative values
> 
>    The spec says:
> 
>      The relative changes for any of the attributes above can 
> be "+10",
>      "-5.5", "+15%", "-8%". ...
> 
>    It's not clear whether those are only permissible values, 
> or those are
>    just examples. In an example in this section, a value 
> "-10%" is used,
>    so maybe those are intended to be examples, but then the 
> spec should
>    clearly say so.
> 
>   2.10 "audio" Element
> 
>    Is it considered to use XLink [4] rather than the "src" attribute?
> 
>       [4] http://www.w3.org/TR/xlink
> 
>   2.12 Miscellaneous relevant XML features
> 
>    In "Usage note 1", the spec says:
> 
>      Usage note 1: When engines support non-standard elements and
>      attributes it is good practice for the name to identify 
> the feature
>      as non-standard, for example, by using a "x" prefix or a company
>      name prefix.
> 
>    It looks more natural to me to use XML namespaces [5] for 
> this kind of
>    extensions. Is it considered to use namespaces? And is the Speech
>    Synthesis Markup Language going to have its own namespace?
> 
>       [5] http://www.w3.org/TR/REC-xml-names
> 
>   3.2 Other Phoeneme Alpahbets
> 
>    Change "Phoeneme Alpahbets" to "Phoneme Alphabets".
> 
>   3.3 Audio Element
> 
>    In the first sentence, need whitespace between '"audio"' and
>    "element", and between '"mode"' and "attribute".
> 
>    Other sections have anchor on heading, but this section doesn't. It
>    would be good to have an anchor like:
> 
>      <h3><a name="S3.3" id="S3.3">3.3 Audio Element</a></h3>
> 
>    Also, why only this section uses <strong>...</strong> 
> within heading?
>    It's not critical, but looks slightly strange.
> 
>   3.4 Mark Element
> 
>    In the first sentence, need whitespace between '"mark"' 
> and "element".
> 
>    Same comment as "3.3 Audio Element" on anchor.
> 
>   3.5 Unspecified Requirements
> 
>    Same comment as "3.3 Audio Element" on anchor.
> 
>   3.6 Compliance
> 
>    An anchor like:
> 
>      <h3><a name="S3.3" id="S3.3">3.6 Compliance</a></h3>
> 
>    looks a bit strange.
> 
>   3.7 "lowlevel" Elements: Fine-Grained Acoustic-Prosodic Control
> 
>    Similar comment as "3.6 Compliance" on anchor.
> 
>     "ph" Element: Phoneme with Duration
> 
>    In the following example:
> 
>      <lowlevel alt="hello">
>        <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/>
>        <ph p="l" d=".0693"/><ph p="ou" d=".2181"/>
>      </lowlevel>
>      <!-- This example uses WorldBet phonemes -->
> 
>    "&" in an attribute value (p="&") must be escaped as "&amp;" or
>    "&#x26;" or "&#38;", otherwise this example is not well-formed.
> 
>     "f0" Element: Timed Pitch Targets
> 
>    In the fourth sentence of the first paragraph, "The value 
> attribute"
>    would be better to be written as 'The "v" attribute' or 'The "v"
>    (value) attribute'.
> 
>    In the following example:
> 
>      <lowlevel alt="hello" pitch="absolute">
>        <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/>
>        <ph p="l" d=".0693"/><ph p="ou" d=".2181"/>
>        <!-- This example uses WorldBet phonemes -->
>      
>        <f0 v="103.5"/> <f0 v="112.5" t=".075"/>
>        <f0 v="113.2" t=".175"/> <f0="128.1" t=".28"/>
>      </lowlevel>
> 
>    Same comment as '"ph" Element: Phoneme with Duration' on "&".
> 
>    <f0="128.1" t=".28"/> should be <f0 v="128.1" t=".28"/> .
> 
>   3.8 Intonational Controls
> 
>    Similar comment as "3.6 Compliance" on anchor.
> 
>    In the first sentence of the last paragraph, change "emphasis
>    elementcan" to "emphasis element can".
> 
>   3.9 "value" Element
> 
>    Similar comment as "3.6 Compliance" on anchor.
> 
> 4. Examples
> 
>    In the second sentence of the first paragraph, change 
> "elementsare" to
>    "elements are".
> 
>    In the second example, the following URI is used:
> 
>      <paragraph><voice gender="male">
>      Here's a sample.  <audio src="http://www.w3c.org/music.wav">
>      Would you like to buy it?</voice></paragraph>
> 
>    Even in example, I'd suggest not to use the domain name 
> "w3c.org". The
>    "canonical" domain name for W3C is "w3.org", and using 
> "w3c.org" just
>    confuses people. For use as examples, I'd suggest to use reserved
>    example domain names (e.g. example.com, example.net, 
> example.org), as
>    specified by RFC 2606 [6].
> 
>       [6] http://www.ietf.org/rfc/rfc2606.txt
> 
> 5. DTD for the Speech Synthesis Markup Language
> 
>    As already pointed out, there are number of problems in 
> this DTD and
>    need serious rework. There are some basic syntax errors, e.g.:
> 
>      <!ENTITY % integer "CDATA" >
>         ...
>      <!ATTLIST voice
>           gender   (male|female|neutral)                  #IMPLIED
>           age      (%integer;|child|teenager|adult|elder) #IMPLIED
>           variant  (%integer;|different)                  #IMPLIED
>           name     (%voice-name;|default)                 #IMPLIED >
> 
>    Probably the intention was to allow integer values or 
> those enumerated
>    values on the age and the variant attributes, but this 
> definition only
>    states that "CDATA" (as literal string) is one of 
> enumerated values -
>    values like "20" are invalid. Unfortunately DTD doesn't have enough
>    expressive power to express intended constraint.
> 
> ==========
> 
> Regards,
> -- 
> Masayasu Ishikawa / mimasa@w3.org
> W3C - World Wide Web Consortium
> 
> 

Received on Monday, 11 September 2000 11:44:11 UTC