- From: Walker, Mark R <mark.r.walker@intel.com>
- Date: Mon, 11 Sep 2000 08:43:58 -0700
- To: "'Masayasu Ishikawa'" <mimasa@w3.org>, www-voice@w3.org
Masayasu - Thank you very much for your exhaustive review of the speech synthesis specification. I will incorporate your notations on the document errors into the change document as soon as possible. I will also attempt to answer your questions in second mail thread. MRW > -----Original Message----- > From: Masayasu Ishikawa [mailto:mimasa@w3.org] > Sent: Sunday, September 10, 2000 8:52 PM > To: www-voice@w3.org > Subject: Comments on the "Speech Synthesis Markup Language > Specification > for the Speech Interface Framework" > > > Masayasu Ishikawa <mimasa@w3.org> wrote: > > > Other non-i18n related comments will be sent separately. > > And here's a list of non-i18n related comments. These are my personal > comments, not representing the I18N WG or any other group. > > ========== > > Abstract > > In the second paragraph, change "a XML markup language" to "an XML > markup language". > > Table of Contents > > A link to section 1.2 points to section 1.1, and a link to > section 1.3 > points to section 1.2. > > 1. Introduction > > The spec says: > > The W3C Standard is known as the Speech Recognition Grammar > Specification and is based upon the JSML specification, which is > owned by Sun Microsystems, Inc., California, U.S.A. > > but the Speech Recognition Grammar Specification is a Working Draft > and it is inappropriate to cite it as "W3C Standard", as clearly > indicated in the "Status of this Document" section of the Speech > Recognition Grammar Specification. > > 1.1 Terminology and Design Concepts > > In the list of key design criteria, item 2 > "Interoperability", change > "Audio Cascading Style Sheets" to "Aural Cascading Style Sheets". > > 1.3 Document Generation, Applications and Contexts > > In the list of important instances of architectures or > designs, item 2 > "Interoperability with", change "Cascading Style Sheets, > level 2 CSS2 > Specification" would be better to be written as "Cascading Style > Sheets, level 2 (CSS2) Specification". > > 2.2 "xml:lang" Attribute: Language > > In the first paragraph, need whitespace between '"xml:lang"' and > "attribute". > > Example uses the para element, but it's not defined in the > DTD found > in section 5. It should be the paragraph element. > > In "Usage note 5", change "handledby" to "handled by". > > 2.3 "paragraph" and "sentence": Text Structure Elements > > In the first paragraph, need whitespace between '"sentence"' and > "element". > > The spec says: > > Usage note 1: For brevity, the markup also supports <p> > and <s> as > exact equivalents of <paragraph> and <sentence>. (Note: XML > requires that the opening and closing elements be > identical so <p> > text </paragraph> is not legal.). Also note that <s> means > "strike-out" in HTML 4.0 and earlier, and in > XHTML-1.0-Transitional > but not in XHTML-1.0-Strict. > > But neither <p> nor <s> element is defined in the DTD (even though > they appear in the "%structure;" parameter entity). Also, <s> means > "strike-through" in HTML 4.0/4.01 Transitional and Frameset, but no > "official" earlier version of HTML (3,2, 2.0, ...) defined the s > element. Both HTML+ [1] and HTML 3.0 [2] proposed the s > element, but > they were never standardized. > > [1] http://www.w3.org/MarkUp/HTMLPlus/htmlplus_16.html > [2] http://www.w3.org/MarkUp/html3/emphasis.html > > 2.4 "sayas" Element > > In the second paragraph, the spec says: > > The "type" attribute is a required attribute that indicates the > contained text construct. The format is a text type optionally > followed by a colon and a format. The base set of type values, > divided according to broad functionality, is as follows: > > but in an example where the sub attribute is used, the > type attribute > is not used. Is it required even when the sub attribute is used? > > Also, the above attribute value format is not reflected in the DTD > found in section 5. The following enumerated definition in the DTD: > > <!ENTITY % sayas-types > "(acronym|number|ordinal|digits|telephone|date|time| > duration|currency|measure|name|net|address)"> > > doesn't allow formats like "number:ordinal", while it > allows formats > like "ordinal", which seems to be an error according to the prose > text. You would have to list all the possible combinations. > > Pronunciation Types > > In the DTD, the "sub" attribute is not defined. > > Time, Date and Measure Types > > A lot of format values like "dmy" and "mdy" appear, but there's no > formal definition of each format value. People might guess > what "dmy" > means, but as a specification, those definitions need to > be clear and > precise. Relevant definitions in ISO 8601 [3] > (Representation of dates > and times) may be helpful. > > [3] http://www.iso.ch/markete/8601.pdf > > Time, Date and Measure Types > > In the example, the following line: > > Proposals are due in <sayas type="date:my"> 5/2001 <sayas/> > > should be: > > Proposals are due in <sayas type="date:my"> 5/2001 </sayas> > > Address, Name, Net Types > > Is "net:url" specifically for URL only? Or, does it allow > other URIs > (e.g. URN)? > > In "Usage note 1", > > <sayas type="date:ymd"> 2000/1/20 <sayas> > > should be > > <sayas type="date:ymd"> 2000/1/20 </sayas> > > In the first sentence of "Usage note 3": > > Usage note 3: The "sayas" element can be only be used ... > > Either of "be" is unnecessary. > > 2.5 "phoneme" Element > > In the second sentence of the first paragraph, need > whitespace between > '"ph"' and "attribute". > > 2.9 "prosody" Element > > Relative values > > The spec says: > > The relative changes for any of the attributes above can > be "+10", > "-5.5", "+15%", "-8%". ... > > It's not clear whether those are only permissible values, > or those are > just examples. In an example in this section, a value > "-10%" is used, > so maybe those are intended to be examples, but then the > spec should > clearly say so. > > 2.10 "audio" Element > > Is it considered to use XLink [4] rather than the "src" attribute? > > [4] http://www.w3.org/TR/xlink > > 2.12 Miscellaneous relevant XML features > > In "Usage note 1", the spec says: > > Usage note 1: When engines support non-standard elements and > attributes it is good practice for the name to identify > the feature > as non-standard, for example, by using a "x" prefix or a company > name prefix. > > It looks more natural to me to use XML namespaces [5] for > this kind of > extensions. Is it considered to use namespaces? And is the Speech > Synthesis Markup Language going to have its own namespace? > > [5] http://www.w3.org/TR/REC-xml-names > > 3.2 Other Phoeneme Alpahbets > > Change "Phoeneme Alpahbets" to "Phoneme Alphabets". > > 3.3 Audio Element > > In the first sentence, need whitespace between '"audio"' and > "element", and between '"mode"' and "attribute". > > Other sections have anchor on heading, but this section doesn't. It > would be good to have an anchor like: > > <h3><a name="S3.3" id="S3.3">3.3 Audio Element</a></h3> > > Also, why only this section uses <strong>...</strong> > within heading? > It's not critical, but looks slightly strange. > > 3.4 Mark Element > > In the first sentence, need whitespace between '"mark"' > and "element". > > Same comment as "3.3 Audio Element" on anchor. > > 3.5 Unspecified Requirements > > Same comment as "3.3 Audio Element" on anchor. > > 3.6 Compliance > > An anchor like: > > <h3><a name="S3.3" id="S3.3">3.6 Compliance</a></h3> > > looks a bit strange. > > 3.7 "lowlevel" Elements: Fine-Grained Acoustic-Prosodic Control > > Similar comment as "3.6 Compliance" on anchor. > > "ph" Element: Phoneme with Duration > > In the following example: > > <lowlevel alt="hello"> > <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/> > <ph p="l" d=".0693"/><ph p="ou" d=".2181"/> > </lowlevel> > <!-- This example uses WorldBet phonemes --> > > "&" in an attribute value (p="&") must be escaped as "&" or > "&" or "&", otherwise this example is not well-formed. > > "f0" Element: Timed Pitch Targets > > In the fourth sentence of the first paragraph, "The value > attribute" > would be better to be written as 'The "v" attribute' or 'The "v" > (value) attribute'. > > In the following example: > > <lowlevel alt="hello" pitch="absolute"> > <ph p="pau" d=".21"/><ph p="h" d=".0949"/><ph p="&" d=".0581"/> > <ph p="l" d=".0693"/><ph p="ou" d=".2181"/> > <!-- This example uses WorldBet phonemes --> > > <f0 v="103.5"/> <f0 v="112.5" t=".075"/> > <f0 v="113.2" t=".175"/> <f0="128.1" t=".28"/> > </lowlevel> > > Same comment as '"ph" Element: Phoneme with Duration' on "&". > > <f0="128.1" t=".28"/> should be <f0 v="128.1" t=".28"/> . > > 3.8 Intonational Controls > > Similar comment as "3.6 Compliance" on anchor. > > In the first sentence of the last paragraph, change "emphasis > elementcan" to "emphasis element can". > > 3.9 "value" Element > > Similar comment as "3.6 Compliance" on anchor. > > 4. Examples > > In the second sentence of the first paragraph, change > "elementsare" to > "elements are". > > In the second example, the following URI is used: > > <paragraph><voice gender="male"> > Here's a sample. <audio src="http://www.w3c.org/music.wav"> > Would you like to buy it?</voice></paragraph> > > Even in example, I'd suggest not to use the domain name > "w3c.org". The > "canonical" domain name for W3C is "w3.org", and using > "w3c.org" just > confuses people. For use as examples, I'd suggest to use reserved > example domain names (e.g. example.com, example.net, > example.org), as > specified by RFC 2606 [6]. > > [6] http://www.ietf.org/rfc/rfc2606.txt > > 5. DTD for the Speech Synthesis Markup Language > > As already pointed out, there are number of problems in > this DTD and > need serious rework. There are some basic syntax errors, e.g.: > > <!ENTITY % integer "CDATA" > > ... > <!ATTLIST voice > gender (male|female|neutral) #IMPLIED > age (%integer;|child|teenager|adult|elder) #IMPLIED > variant (%integer;|different) #IMPLIED > name (%voice-name;|default) #IMPLIED > > > Probably the intention was to allow integer values or > those enumerated > values on the age and the variant attributes, but this > definition only > states that "CDATA" (as literal string) is one of > enumerated values - > values like "20" are invalid. Unfortunately DTD doesn't have enough > expressive power to express intended constraint. > > ========== > > Regards, > -- > Masayasu Ishikawa / mimasa@w3.org > W3C - World Wide Web Consortium > >
Received on Monday, 11 September 2000 11:44:11 UTC