RE: Consolidated comments on SSML from Richard Ishida on 2003-07-04 (www-voice@w3.org from July to September 2003)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 4 Jul 2003 18:54:37 +0100
To: "'Daniel Burnett'" <burnett@nuance.com>, <w3c-i18n-ig@w3.org>
Cc: <www-voice@w3.org>
Message-ID: <000a01c34255$567557d0$6401a8c0@w3c40upc3ma3j2>
Dan, 

Martin asked me to point you to our responses to your responses regard
SSML at http://www.w3.org/International/2003/ssml10/ssml-feedback.html

Thankyou for the time you have dedicated to our comments.  We hope these
additional responses prove helpful.

Best regards,
Richard.

============
Richard Ishida
W3C

tel: +44 1753 480 292
http://www.w3.org/International/
http://www.w3.org/People/Ishida/



> -----Original Message-----
> From: w3c-i18n-ig-request@w3.org 
> [mailto:w3c-i18n-ig-request@w3.org] On Behalf Of Daniel Burnett
> Sent: 09 June 2003 19:16
> To: w3c-i18n-ig@w3.org
> Cc: www-voice@w3.org
> Subject: RE: Consolidated comments on SSML
> 
> 
> 
> Dear Martin (and the Internationalization Working Group),
> 
> Thank you again for your very thorough review of the SSML 
> specification. This email contains the second big block of 
> responses.  Remaining points will be addressed in later emails.
> 
> If you believe we have not adequately addressed your issues 
> with our responses, please let us know as soon as possible.  
> If we do not hear from you within 14 days, we will take this 
> as tacit acceptance.  Given the volume of responses in this 
> email, we understand that a complete review by you may take 
> longer than this amount of time; if so, we would appreciate 
> an estimate as to when you might be able to complete your review.
> 
> Once again, thank you for your thorough and considered input 
> on the specification.
> 
> -- Dan Burnett
> 
> Synthesis Team Leader, VBWG
> 
> [VBWG responses follow]
> 
> [1] Rejected.  We reject the notion that on principle this is 
> more difficult for some languages.  For all languages 
> supported by synthesis vendors today this is not a problem.  
> As long as there is a way to write the text, the engine can 
> figure out how to speak it.  Given the lack of broad support 
> by vendors for Arabic and Hebrew, we prefer not to include 
> examples for those languages.
> 
> [2] Rejected.  Special tagging for bidirectional rendering 
> would only be needed if there were not already a means of 
> clearly indicating the language, language changes, and the 
> sequence of languages.  In SSML it is always clear when a 
> language shift occurs -- either when xml:lang is used or when 
> the <voice> element is used.  In any case, the encoding into 
> text handles this itself.  We believe that it is sufficient 
> to require a text/Unicode representation for any language 
> text. Visual or other non-audio rendering from that 
> representation is outside the scope of SSML.
> 
> [7] Accepted.  We will describe the relationship.
> 
> [13] Accepted.  We agree that this is confusing.  We will 
> make section 1.1 more text-only and cross-reference as 
> necessary. We will also remove "Vocabulary" from the title of 
> section 1.1.
> 
> [17] Accepted.  xml:lang will now be mandatory on the root 
> <speak> element.
> 
> [20] Rejected/Question.  For all but the <desc> element, this 
> can be accomplished using the <voice> element.  For the 
> <desc> element, it's unclear why the description would be in 
> a language different from that in which it is embedded; can 
> you provide a better use case? In the <voice> element 
> description we will point out that one of its common uses is 
> to change the language.  In 2.1.2, we will mention that 
> xml:lang is permitted as a convenience on <p> and <s> only 
> because it's common to change the language at those levels. 
> We recommend that other changes in the language be done with 
> the <voice> element.
> 
> [25] Yes, the "may" is a keyword as in rfc2119, and 
> conformant processors are permitted to vary in their 
> implementation of xml:lang in SSML.  Although processors are 
> required to implement the standard xml:lang behavior defined 
> by XML 1.0, in SSML the attribute also implies a change in 
> voice which may or may not be observed by the processor. We 
> will clarify this in the specification.
> 
> [26] Accepted.  We accept the editorial change.
> We will remove the <paragraph> and <sentence> elements.
> 
> [27] Accepted.  As you suggest, we will remove the examples 
> from this section in order to reduce confusion.
> 
> [29] Accepted.
>  
> [32] This wording was accidentally left over from an earlier 
> draft.  We will correct it.
> 
> [38] Accepted.  We will clarify in the text that this element 
> is designed for strictly phonemic and phonetic notations and 
> that the example uses Unicode to represent IPA.  We will also 
> clarify that the phonemic/phonetic string does not undergo 
> text normalization and is not treated as a token for lookup 
> in the lexicon, while values in <say-as> and <sub> may undergo both.
> 
> [40] Accepted.  IPA is an alphabet of phonetic symbols.  The 
> only representation in IPA is phonetic, although it is common 
> to select specific phones as representative examples of 
> phonemic classes. Also, IPA is only one possible alphabet 
> that can be used in this element.  The <phoneme> element will 
> accept both phonetic and phonemic alphabets, and both 
> phonetic and phonemic string values for the ph attribute.  We 
> will clarify this and add or reference a description of the 
> difference between phonemic and phonetic.
> 
> [47] Rejected.  There is no intention that pronunciations can 
> be given by other means within an SSML document.  Any use of 
> SSML in this way is outside the scope of the language.  Note 
> that pronunciations can of course be given in an external 
> lexicon; it is conceivable that other annotation formats 
> could be used in such a document.
> 
> [60] Rejected.  This is a tokenization issue.  Tokens in SSML 
> are delimited both by white space and by SSML elements.  You 
> can write a word as two separate words and it will have a 
> break, you can insert an SSML element, or you can use stress 
> marks externally. For Asian languages with characters without 
> spaces to delimit words, if you insert SSML elements it 
> automatically creates a boundary between words.  You can use 
> a similar approach for German, e.g. with 
> "Fussbalweltmeisterschaft".  If you insert a <break> in the 
> middle it actually splits the word, but that's probably what 
> you wanted:  Fussbal<break>weltmeisterschaft. If you wish to 
> insert prosodic controls, that would be handled better via an 
> external lexicon which can provide stress markers, etc.
> 
> [70] Accepted.  Although the units are already marked as 
> case- sensitive in the Schema, we will clarify in the text 
> that such units are case-sensitive.
> 
> [78] Accepted.  We will add this.
> 
> [84] Accepted.  We will revise the text appropriately.
> 
> 
> > -----Original Message-----
> > From: Martin Duerst [mailto:duerst@w3.org]
> > Sent: Friday, January 31, 2003 7:50 PM
> > To: www-voice@w3.org
> > Cc: w3c-i18n-ig@w3.org
> > Subject: Consolidated comments on SSML
> > 
> > 
> > 
> > Dear Voice Browser WG,
> > 
> > These are the Last Call comments on Speech Synthesis
> > Markup Language (http://www.w3.org/TR/speech-synthesis/)
> > from the Core Task Force of the Internationalization (I18N) 
> WG. Please 
> > make sure that you send all emails regarding these comments to 
> > w3c-i18n-ig@w3.org, rather than to me personally or just to 
> > www-voice@w3.org (to which we are not subscribed).
> > 
> > These comments are based on review by Richard Ishida and myself and 
> > have been discussed and approved the last I18N Core TF 
> teleconference. 
> > They are ordered by section and numbered for easy 
> reference. We have 
> > not classified these issues into editorial and substantial, but we 
> > think that it should be clear from their discription.
> > 
> > General:
> > [01]  For some languages, text-to-speech conversion is more 
> difficult
> >        than for others. In particular, Arabic and Hebrew are usually
> >        written with none or only a few vowels indicated. Japanese
> >        often needs separate indications for pronunciation.
> >        It was no clear to us whether such cases were considered,
> >        and if they had been considered, what the appropriate
> >        solution was.
> >        SSML should be clear about how it is expected to handle these
> >        cases, and give examples. Potential solutions we 
> came up with:
> >        a) require/recommend that text in SSML is written in an
> >        easily 'speakable' form (i.e. vowelized for Arabic/Hebrew,
> >        or with Kana (phonetic alphabet(s)) for Japanese. (Problem:
> >        displaying the text visually would not be 
> satisfactory in this
> >        case); b) using <sub>; c) using <phoneme> (Problem: only
> >        having IPA available would be too tedious on authors);
> >        d) reusing some otherwise defined markup for this purpose
> >        (e.g. <ruby> from http://www.w3.org/TR/ruby/ for Japanese);
> >        e) creating some additional markup in SSML.
> > 
> > General: Tagging for bidirectional rendering is not needed 
> [02]  for 
> > text-to-speech conversion. But there is some provision
> >        for SSML content to be displayed visually (to cover WAI
> >        needs). This will not work without adequate support of bidi
> >        needs, with appropriate markup and/or hooks for styling.
> > 
> > General: Is there a tag that allows to change the language in [03]  
> > the middle of a sentence (such as <html:span>)? If not,
> >        why not? This functionality needs to be provided.
> > 
> > 
> > Abstract: 'is part of this set of new markup specifications':
> > Which set?
> > [04]
> > 
> > Intro: 'The W3C Standard' -> 'This W3C Specification'
> > [05]
> > 
> > Intro: Please shortly describe the intended uses of SSML here,
> > [06]   rather than having the reader wait for Section 4.
> > 
> > 
> > Section 1, para 2: Please shortly describe how SSML and 
> Sable are [07]  
> > related or different.
> > 
> > 
> > 1.1, table: 'formatted text' -> 'marked-up text'
> > [08]
> > 
> > 1.1, last bullet: add a comma before 'and' to make
> > [09]  the sentence more readable
> > 
> > 
> > 1.2, bullet 4, para 1: It might be nice to contrast the 45 phonemes 
> > [10] in English with some other language. This is just one case that
> >       shows that there are many opportunities for more 
> internationally
> >       varied examples. Please take any such oppurtunities.
> > 
> > 1.2, bullet 4, para 3: "pronunciation dictionary" ->
> > [11] "language-specific pronunciation dictionary"
> > 
> > 1.2:  How is "Tlalpachicatl" pronounced? Other examples may 
> be [12]  
> > St.John-Smyth (sinjen-smaithe) or Caius College
> >        (keys college), or President Tito (sutto) [president of the
> >        republic of Kiribati (kiribass)
> > 
> > 
> > 1.1 and 1.5: Having a 'vocabulary' table in 1.1 and then a [13] 
> > terminology section is somewhat confusing.
> >       Make 1.1 e.g. more text-only, with a reference to 1.5,
> >       and have all terms listed in 1.5.
> > 
> > 1.5: The definition of anyURI in XML Schema is considerably 
> wider [14] 
> > than RFC 2396/2732, in that anyURI allows non-ASCII characters.
> >       For internationalization, this is very important. The text
> >       must be changed to not give the wrong impression.
> > 
> > 1.5 (and 2.1.2): This (in particular 'following the
> > [15]  XML specification') gives the wrong impression of where/how
> >       xml:lang is defined. xml:lang is *defined* in the XML spec,
> >       and *used* in SSML. Descriptions such as 'a language code is
> >       required by RFC 3066' are confusing. What kind of 
> language code?
> >       Also, XML may be updated in the future to a new version of RFC
> >       3066, SSML should not restrict itself to RFC 3066
> >       (similar to the recent update from RFC 1766 to RFC 3066).
> >       Please check the latest text in the XML errata for this.
> > 
> > 
> > 2., intro: xml:lang is an attribute, not an element.
> > [16]
> > 
> > 2.1.1, para 1: Given the importance of knowing the language 
> for [17] 
> > speech synthesis, the xml:lang should be mandatory on the root
> >       speak element. If not, there should be a strong
> > injunction to use it.
> > 
> > 2.1.1: 'The version number for this specification is 1.0.': please 
> > [18] say that this is what has to go into the value of the 'version'
> >       attribute.
> > 
> > 
> > 2.1.2., for the first paragraph, reword: 'To indicate the 
> natural [19] 
> > language of an element and its attributes and subelements,
> >       SSML uses xml:lang as defined in XML 1.0.'
> > 
> > The following elements also should allow xml:lang:
> > [20] - <prosody> (language change may coincide with prosody change)
> >       - <audio> (audio may be used for foreign-language pieces)
> >       - <desc> (textual description may be different from audio,
> >            e.g. <desc xml:lang='en'>Song in Japanese</desc>
> >       - <say-as> (specific construct may be in different language)
> >       - <sub>
> >       - <phoneme>
> > 
> > 2.1.2: 'text normalization' (also in 2.1.6): What does this 
> mean? [21] 
> > It needs to be clearly specified/explained, otherwise there may
> >       be confusion with things such as NFC (see Character Model).
> > 
> > 2.1.2, example 1: Overall, it may be better to use utf-8 
> rather than 
> > [22] iso-8859-1 for the specification and the examples.
> > 
> > 2.1.2, example 1: To make the example more realistic, in 
> the paragraph 
> > [23] that uses lang="ja" you should have Japanese text - not an 
> > English
> >       transcription, which may not use as such on a Japanese
> > text-to-speech
> >       processor. In order to make sure the example can be 
> viewed even
> >       in situations where there are no Japanese fonts available, and
> >       can be understood by everybody, some explanatory text 
> > can provide
> >       the romanized from. (we can help with Japanese if necessary)
> > 
> > 2.1.2, 1st para after 1st example: Editorial.  We prefer 
> "In the [24] 
> > case that a document requires speech output in a language not
> >       supported by the processor, the speech processor
> > largely determines
> >       the behavior."
> > 
> > 2.1.2, 2nd para after 1st example: "There may be 
> variation..." [25] Is 
> > the 'may' a keyword as in rfc2119? Ie. Are you allowing
> >       conformant processors to vary in the implementation 
> of xml:lang?
> >       If yes, what variations exactly would be allowed?
> > 
> > 
> > 2.1.3: 'A paragraph element represents the paragraph 
> structure' [26] 
> > -> 'A paragraph element represents a paragraph'. (same for sentence)
> >       Please decide to either use <p> or <paragraph>, but not both
> >       (and same for sentence).
> > 
> > 
> > 2.1.4: <say-as>: For interoperability, defining attributes [27] and 
> > giving (convincingly useful) values for these attributes
> >       but saying that these will be specified in a separate document
> >       is very dangerous. Either remove all the details (and then
> >       maybe also the <say-as> element itself), or say that the
> >       values given here are defined here, but that future versions
> >       of this spec or separate specs may extend the list of values.
> >       [Please note that this is only about the attribute values,
> >        not the actual behavior, which is highly language-dependent
> >        and probably does not need to be specified in every detail.]
> > 
> > 2.1.4, interpret-as and format, 6th paragraph: requirement 
> that [28] 
> > text processor has to render text in addition to the indicated
> >       content type is a recipe for bugwards compatibility (which
> >       should be avoided).
> > 
> > 2.1.4, 'locale': change to 'language'.
> > [29]
> > 
> > 2.1.4: How is format='telephone' spoken?
> > [30]
> > 2.1.4: Why are there 'ordinal' and 'cardinal' values for both
> > [31]   interpret-as and format?
> > 
> > 2.1.4 'The detail attribute can be used for all say-as 
> content types.'
> > [32]   What's a content type in this context?
> > 
> > 2.1.4 detail 'strict': 'speak letters with all detail': As opposed 
> > [33]  to what (e.g. in that specific example)?
> > 
> > 2.1.4, last table: There seem to be some fixed-width aspects in the
> > [34]   styling of this table. This should be corrected to 
> > allow complete
> >         viewing and printing at various overall widths.
> > 
> > 2.1.4, 4th para (and several similar in other sections):
> > [35]  "The say-as element can only contain text." would be easier
> >        to understand; we had to look around to find out whether the
> >        current phrasing described an EMPTY element or not.
> > 
> > 2.1.4. For many languages, there is a need for additional 
> information.
> > [36]   For example, in German, ordinal numbers are denoted 
> > with a number
> >        followed by a period (e.g. '5.'). They are read
> > depending on case
> >        and gender of the relevant noun (as well as depending 
> > on the use
> >        of definite or indefinite article).
> > 
> > 2.1.4, 4th row of 2nd table: I've seen some weird phone 
> formats, but 
> > [37]  nothing quite like this! Maybe a more normal example would NOT
> >        pronounce the separators. (Except in the Japanese
> > case, where the
> >        spaces are (sometimes) pronounced (as 'no').)
> > 
> > 
> > 2.1.5, <phoneme>:
> > [38]  It is unclear to what extent this element is designed for
> >        strictly phonemic and phonetic notations, or also 
> (potentially)
> >        for notations that are more phonetic-oriented than
> > usual writing
> >        (e.g. Japanese kana-only, Arabic/Hebrew with full vowels,...)
> >        and where the boundaries are to other elements such 
> as <say-as>
> >        and <sub>. This needs to be clarified.
> > 
> > 2.1.5 There may be different flavors and variants of IPA (see e.g. 
> > [39]  references in ISO 10646). Please make sure it is clear which
> >        one is used.
> > 
> > 2.1.5 IPA is used both for phonetic and phonemic notations. Please 
> > [40]  clarify which one is to be used.
> > 
> > 2.1.5 This may need a note that not all characters used in IPA are 
> > [41]  in the IPA block.
> > 
> > 2.1.5 This seems to say that the only (currently) allowed value for 
> > [42]  alphabet is 'ipa'. If this is the case, this needs to be said
> >        very clearly (and it may as well be defined as default, and
> >        in that case the alphabet attribute to be optional). If there
> >        are other values currently allowed, what are they? How are
> >        they defined?
> > 
> > 2.1.5 'alphabet' may not be the best name. Alphabets are 
> sets of [43]  
> > characters, usually with an ordering. The same set of characters
> >        could be used in totally different notations.
> > 
> > 2.1.5 What are the interactions of <phoneme> for foreign 
> language [44]  
> > segments? Do processors have to handle all of IPA, or only the
> >        phonemes that are used in a particular language?
> > Please clarify.
> > 
> > 2.1.5, 1st example:  Please try to avoid character entities, as it 
> > [45] suggests strongly that this is the normal way to input this 
> > stuff.
> >       (see also issue about utf-8 vs. iso-8859-1)
> > 
> > 
> > 2.1.5 and 2.1.6: The 'alias' and 'ph' attributes in some
> > [46]  cases will need additional markup (e.g. for fine-grained
> >        prosody, but also for additional emphasis, bidirectionality).
> >        This would also help tools for translation,...
> >        But markup is not possible for attributes. These attributes
> >        should be changed to subelements, e.g. similar to the <desc>
> >        element inside <audio>.
> > 
> > 2.1.5 and 2.1.6: Can you specify a null string for the ph and alias 
> > [47] attributes? This may be useful in mixed formats where the
> >       pronunciation is given by another means, e.g. with ruby
> > annotation.
> > 
> > 
> > 2.1.6 The <sub> element may easily clash or be confused with <sub> 
> > [48]  in HTML (in particular because the specification seems to be
> >        designed to allow combinations with other markup vocabularies
> >        without using different namespaces). <sub> should be renamed,
> >        e.g. to <subst>.
> > 
> > 2.1.6 For abbreviations,... there are various cases. Please 
> check [49]  
> > that all the cases in
> >        
> > http://lists.w3.org/Archives/Member/w3c-i18n-ig/2002Mar/0064.html
> >        are covered, and that the users of the spec know how 
> to handle
> >        them.
> > 
> > 2.1.6, 1st para: "the specified text" ->
> > [50]   "text in the alias attribute value".
> > 
> > 
> > 2.2.1, between the tables: "If there is no voice available for the 
> > [51]  requested language ... select a voice ... same language but 
> > different
> >        region..."  I'm not sure this makes sense.  I could
> > understand that
> >        if there is no en-UK voice you'd maybe go for an en-US 
> > voice - this
> >        is a different DIALECT of English.  If there are no 
> > Japanese voices
> >        available for Japanese text, I'm not sure it makes 
> > sense to use an
> >        English voice. What happens in this situation?
> > 
> > 2.2.1 It should be mentioned that in some cases, it may make
> > sense to have
> > [52]  a short piece of e.g. 'fr' text in an 'en' text been spoken by
> >        an 'en' text-to-speech converter (the way it's often done by
> >        human readers) rather than to throw an error. This is quite
> >        different for longer texts, where it's useless to bother an
> >        user.
> > 
> > 2.2.1: We wonder if there's a need for multiple voices (eg. A
> > group of kids)
> > [53]
> > 
> > 2.2.1, 2nd example: You should include some text here.
> > [54]
> > 
> > 2.2.1 The 'age' attribute should explicitly state that the integer 
> > [55]  is years, not something else.
> > 
> > 2.2.1 The variant attribute should say what it's index 
> origin is [56]  
> > (e.g. either starting at 0 or at 1)
> > 
> > 2.2.1 attribute name: (in the long term,) it may be 
> desirable to use 
> > [57]  an URI for voices, and to have some well-defined format(s)
> >        for the necessary data.
> > 
> > 2.2.1, first example (and many other places): The line 
> break between 
> > [58]  the <voice> start tag and the text "It's fleece was white as 
> > snow."
> >        will have negative effects on visual rendering.
> >        (also, "It's" -> "Its")
> > 
> > 2.2.1, description of priorities of xml:lang, name, 
> variant,...: [59]  
> > It would be better to describe this clearly as priorities,
> >        i.e. to say that for voice selection, xml:lang has highest
> >        priority,...
> > 
> > 
> > 2.2.3 What about <break> inside a word (e.g. for long words such as 
> > [60]  German)? What about <break> in cases where words cannot
> >        clearly be identified (no spaces, such as in 
> Chinese, Japanese,
> >        Thai). <break> should be allowed in these cases.
> > 
> > 2.2.3 and 2.2.4: "x-high" and "x-low": the 'x-' prefix is 
> part of [61]  
> > colloquial English in many parts of the world, but may be
> >        difficult to understand for non-native English speakers.
> >        Please add an explanation.
> > 
> > 
> > 2.2.4: Please add a note that customary pitch levels and
> > [62]  pitch ranges may differ quite a bit with natural
> > language, and that
> >        "high",... may refer to different absolute pitch 
> > levels for different
> >        languages. Example: Japanese has general much lower 
> > pitch range than
> >        Chinese.
> > 
> > 2.2.4, 'baseline pitch', 'pitch range': Please provide definition/
> > [63]   short explanation.
> > 
> > 2.2.4 'as a percent' -> 'as a percentage'
> > [64]
> > 
> > 2.2.4 What is a 'semitone'? Please provide a short explanation. [65]
> > 
> > 2.2.4 In pitch contour, are white spaces allowed? At what 
> places [66]  
> > exactly? In "(0%,+20)(10%,+30%)(40%,+10)", I would propose
> >        to allow whitespace between ')' and '(', but not elsewhere.
> >        This has the benefit of minimizing syntactict differences
> >        while allowing long contours to be formatted with 
> line breaks.
> > 
> > 2.2.4, bullets: Editorial nit.  It may help the first time reader to
> > [67]   mention that 'relative change' is defined a little 
> > further down.
> > 
> > 2.2.4, 4th bullet: the speaking rate is set in words per 
> minute. [68]  
> > In many languages what constitutes a word is often difficult to
> >        determine, and varies considerably in average length.
> >        So there have to be more details to make this work
> > interoperably
> >        in different languages. Also, it seems that 'words 
> per minute'
> >        is a nominal rate, rather than exactly counting words, which
> >        should be stated clearly. An much preferable 
> > alternative is to use
> >        another metric, such as syllables per minute, which has less
> >        unclarity (not
> > 
> > 2.2.4, 5th bullet: If the default is 100.0, how do you make 
> it [69]  
> > louder given that the scale ranges from 0.0 to 100.0?
> >        (or, in other words, is the default to always shout?)
> > 
> > 2.2.4, Please state whether units such as 'Hz' are 
> case-sensitive [70] 
> > or case-insensitive. They should be case-sensitive, because
> >       units in general are (e.g. mHz (milliHz) vs. MHz (MegaHz)).
> > 
> > 
> > 2.3.3 Please provide some example of <desc>
> > [71]
> > 
> > 3.1  Requiring an XML declaration for SSML when XML itself [72] 
> > doesn't require an XML declaration leads to unnecessary
> >       discrepancies. It may be very difficult to check this
> >       with an off-the-shelf XML parser, and it is not reasonable
> >       to require SSML implementations to write their own XML
> >       parsers or modify an XML parser. So this requirement
> >       should be removed (e.g. by saying that SSML requires an XML
> >       declaration when XML requires it).
> > 
> > 
> > 3.3, last paragraph before 'The lexicon element' subtitle: 
> [73] Please 
> > also say that the determination of
> >       what is a word may be language-specific.
> > 
> > 3.3 'type' attribute on lexicon element: What's this attribute used 
> > [74] for? The media type will be determined from the document that
> >       is found at the 'uri' URI, or not?
> > 
> > 
> > 4.1 'synthesis document fragment' -> 'speech synthesis
> > document fragment'
> > [75]
> > 
> > 4.1  Conversion to stand-alone document: xml:lang should 
> not [76] be 
> > removed. It should also be clear whether content of
> >       non-synthesis elements should be removed, or only the
> >       markup.
> > 
> > 
> > 4.4 'requirement for handling of languages': Maybe better 
> to [77] say 
> > 'natural languages', to avoid confusion with markup
> >       languages. Clarification is also needed in the following
> >       bullet points.
> > 
> > 
> > 4.5  This should say that a user agent has to support at least [78] 
> > one natural language.
> > 
> > 
> > App A: 'http://www.w3c.org/music.wav': W3C's Web site is www.w3.org.
> > [79]   But this example should use www.example.org or 
> www.example.com.
> > 
> > App B: 'synthesis DTD' -> 'speech synthesis DTD'
> > [80]
> > 
> > App D: Why does this mentions 'recording'? Please remove or 
> explain. 
> > [81]
> > 
> > App E: Please give a reference for the application to the
> > IETF/IESG/IANA
> > [82]   for the content type 'application/ssml+xml'.
> > 
> > App F: 'Support for other phoneme alphabets.': What's a
> > 'phoneme alphabet'?
> > [83]
> > 
> > App F, last paragraph: 'Unfortunately, ... no standard for 
> designating
> > [84]   regions...': This should be worded differently. RFC 
> > 3066 provides
> >         for the registration of arbitrary extensions, so that e.g.
> >         en-gb-accent-scottish and en-gb-accent-welsh could be
> > registered.
> > 
> > App F, bullet 3: I guess you already know that intonation
> > [85]   requirements can vary considerably across languages, 
> so you'll
> >         need to cast your net fairly wide here.
> > 
> > App G: What is meant by 'input' and 'output' languages? This is the
> > [86]   first time this terminology is used. Please remove 
> or clarify.
> > 
> > App G: 'overriding the SSML Processor default language': 
> There should
> > [87]   be no such default language. An SSML Processor may only
> >         support a single language, but that's different from
> >         assuming a default language.
> > 
> > 
> > 
> > Regards,   Martin.
> > 
> > 
>
Received on Friday, 4 July 2003 13:54:57 UTC