- From: Yin Leng Husband <Yin-Leng.Husband@hp.com>
- Date: Fri, 31 May 2002 15:27:20 +1000
- To: www-i18n-comments@w3.org
- Cc: w3c-ws-arch@w3.org
- Message-ID: <E74B412A1B5FD211AD6C0000F87C38AD045EC7@ozyexc1.itg.qvar.cpqcorp.net>
Re: Last Call # 2 for the Character Model for the World Wide Web Comments are pertinent to WD 30 April 2002 at: http://www.w3.org/TR/2002/WD-charmod-20020430/ <http://www.w3.org/TR/2002/WD-charmod-20020430/> We have found this specification a good reference source on the character model, so a high proportion of the review comments are editorial in nature coming from the perspective of a reader learning about character encoding and normalization issues. The comments are categorized as requested (Substantive,Editorial,Typo,Question,Other) -- 1. Type: E * 1.1 Goals and Scope, last paragraph 1. o "Since other W3C specifications will be based on some of the provisions of this document, without repeating them, software developers implementing W3C specifications must conform to these provisions." o Unclear what "these provisions" (end of sentence) are since the first part of the sentence refers to only "some of the provisions". That is, should software developers implementing W3C specifications conform to some or all of these provisions? 2. Type: Q * 1.2 Background, 3rd paragraph, 2nd bullet o "covers the widest possible range," o Unicode covers the widest possible range of what? Characters? Languages? Scripts? Writing notations? 3. Type: E * 1.2 Background, 3rd paragraph, 3rd bullet o "provides a way of referencing characters independent of the encoding of a resource," o Unclear what the "resource" is. What is the relationship between the characters being referenced and the "resource"? o Is this the intent? - "provides a way to reference characters independent of the encoding of the characters," 4. Type: E * 1.2 Background, 4th paragraph, last sentence o "Unicode now serves as a common reference for W3C specifications and applications." o Unclear what sort of "reference" is meant. o Is this the intent? - "Unicode now serves as a common reference character set for W3C specifications and applications" 5. Type: E * 1.2 Background, 8th paragraph, last bullet o "Use of control codes for various purposes (e.g. bidirectionality control, symmetric swapping, etc.)." o It would be useful to have links to reference material that explain the issues. o E.g. "Use of control codes for various purposes (e.g. bidirectionality control [Unicode Standard 13.2], symmetric swapping [Unicode Standard 13.3], etc.)." 6. Type: E * 1.2 Background, 9th paragraph, 1st sentence o "It should be noted that such properties also exist in legacy encodings (where legacy encoding is taken to mean any character encoding not based on Unicode), and in many cases have been inherited by Unicode in one way or another from such legacy encodings." o Unclear what "such properties" are. The previous sentence talks about "aspects of Unicode" with no mention of "properties". o Is this the intent? - "It should be noted that such aspects also exist in legacy encodings (where legacy encoding is taken to mean any character encoding not based on Unicode), and in many cases have been inherited by Unicode in one way or another from such legacy encodings." 7. Type: E * 2 Conformance, 1st NOTE, 1st sentence o "RFC 2119 makes it clear that requirements that use SHOULD are not optional ..." o Inconsistent usage of term "requirements". The first paragraph of this Conformance section makes a distinction between "requirements" and "recommendations". It says that "requirements are expressed using the key words "MUST", ... etc.". This NOTE talks of "requirements that use SHOULD ..." 8. Type: S * 2 Conformance, 3rd Paragraph, last sentence o " [S] [I] [C] In order to conform to this document, specifications MUST NOT violate any requirements preceded by [S], software MUST NOT violate any requirements preceded by [I], and content MUST NOT violate any requirements preceded by [C]." o How will conformance be enforced? Are the the conformance requirements in this document testable for violations? 9. Type: S * 2 Conformance, 5th Paragraph, 1st sentence o "[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform" o This lowered (to SHOULD) conformance requirement seems to contradict that in the preceding paragraph which states that "[S] Every W3C specification MUST conform to the requirements applicable to specifications, ..." 10. Type: E * 2 Conformance, 5th Paragraph, 1st sentence o "[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform" o Current wording says that in order to conform, the next version is to be modified, i.e. without stating nature of modification. o Is this the intent? - "[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified so that it then becomes conformant." 11. Type: E * 2 Conformance, 6th Paragraph, last sentence o "[I] Where this specification contains a procedural description, it MUST be understood as a way to specify the desired external behavior. Implementations MAY use other ways of achieving the same results, as long as observable behavior is not affected." o "way" in the first sentence refers to "a way to specify" whereas in the second sentence, the "other ways" are "ways of achieving" what is specified. Also current wording "as long as observable behavior is not affected" is probably not the correct requirement. o Is this the intent? - "[I] Where this specification contains a procedural description, it MUST be understood as a way to specify the desired external behavior. Implementations MAY use different means of achieving the same results, as long as observable behavior is as described." 12. Type: E * 3.1.1 Introduction, 2nd EXAMPLE, 1st sentence o "Korean Hangul is a featural syllabary ..." o Would be helpful to define "featural syllabary" and explain distinction between a "syllabary" and "featural syllabary". The 1st and 2nd examples give the impression that the distinction is in arranging "into square syllabic blocks". 13. Type: E * 3.1.1 Introduction, 2nd EXAMPLE, 1st sentence o "... that combines symbols for individual sounds of the language ..." o Are these "individual sounds of the language" phonemes or syllables? o Is this the intent? - "... that combines symbols for individual phonemes [or syllables] of the language ..." 14. Type: E * 3.1.1 Introduction, 3rd EXAMPLE, 1st sentence o "Indic scripts are abugidas." o Would be helpful to indicate definition of "abugidas" explicitly. E.g. "Indic scripts are abugidas where each consonant letter carries an inherent vowel that is eliminated or replaced using semi-regular or irregular ways to combine consonants and vowels into clusters." 15. Type: E * 3.1.1 Introduction, 4th EXAMPLE, 1st sentence o "Arabic script is an example of an abjad." o Would be helpful to indicate definition of "abjad" explicitly. E.g. "Arabic script is an example of an abjad where short vowel sounds are typically not written at all." 16. Type: E * 3.1.1 Introduction, 2nd last paragraph, 1st sentence o "The developers of W3C specifications, and the developers of software based on those specifications, are likely to be more familiar with usages they have experienced and less familiar with the wide variety of usages in an international context." o In both instances of "usages", it is unclear "usages" of what are intended. 17. Type: S * 3.1.3 Units of visual rendering, 3rd paragraph, 1st sentence o "[S] [I] Specifications and software MUST NOT assume a one-to-one mapping between character codes and units of displayed text." o Inconsistency issue? This sentence speaks of mapping between "character codes" whereas the third sentence of the first paragraph of 3.1.3 (There is not a one-to-one correspondence between characters and glyphs) speaks of mapping between "characters", not "character codes". Also, in all the other 3.1.x sections, the [S][I] requirements are about non one-to-one correspondence between "characters", not "character codes". 18. Type: E * 3.1.3 Units of visual rendering, 5th paragraph, 3rd sentence o "The Unicode Standard [Unicode] <http://www.w3.org/TR/2002/WD-charmod-20020430/#unicode#unicode> requires that characters be stored and interchanged in logical order." o Would be helpful to define "logical order" or to provide link to reference material such as Unicode Standard, Section 2.2 where it is defined. 19. Type: E * 3.1.5 Units of collation, 5th EXAMPLE, 1st sentence o "In Thai the sequence U+0E44 U+0E01 must be sorted as if it was written U+0E01 U+0E44." o Would be helpful to show the actual glyphs for U+0E44 and U+0E01. 20. Type: E * 3.1.7 Summary, 1st paragraph, 2nd and 3rd sentences o "In the context of the digital representations of text, a character can be defined informally as a small logical unit of text. Text is then defined as sequences of characters." o "Character" and "text" are defined circularly. 21. Type: S * 3.6.2 Character encoding identification, 9th paragraph, 2nd sentence o "[S] Specifications MAY define either UTF-8 or UTF-16 as a default encoding form (or both if they define suitable means of distinguishing them), but they MUST NOT use any other character encoding as a default." o Since specifications "MUST NOT use any other character encoding as a default" other than "either UTF-8 or UTF-16" should the beginning of the sentence be "[S] Specifications MUST define either UTF-8 or UTF-16 as a default encoding form... " ? 22. Type: S * 3.6.2 Character encoding identification, 9th paragraph, last sentence o "[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data." o It would be helpful to either give examples of the undesirable "heuristics" or the reasons for banning "use of heuristics". Would the absence of a BOM in UTF-8 encoding be considered use of heuristics for identifying encoding? 23. Type: E * 3.6.2 Character encoding identification, 12th paragraph, last sentence o "[I] On interfaces to other protocols, software SHOULD support conversion ..." o In the phrase "to other protocols", which is the base protocol that the "other protocols" are being distinguished from? o Is this the intent? - "[I] On interfaces to protocols, software SHOULD support conversion ..." 24. Type: S * 3.6.2 Character encoding identification, 12th paragraph, last sentence o "[I] On interfaces to other protocols, software SHOULD support conversion between <http://www.w3.org/TR/2002/WD-charmod-20020430/#Unicode_Encoding_Form#Unicod e_Encoding_Form> Unicode encoding forms as well as any other necessary conversions." o Should it be "between <http://www.w3.org/TR/2002/WD-charmod-20020430/#Unicode_Encoding_Form#Unicod e_Encoding_Form> Unicode encoding forms" or "to <http://www.w3.org/TR/2002/WD-charmod-20020430/#Unicode_Encoding_Form#Unicod e_Encoding_Form> Unicode encoding forms" or "both between and to <http://www.w3.org/TR/2002/WD-charmod-20020430/#Unicode_Encoding_Form#Unicod e_Encoding_Form> Unicode encoding forms"? 25. Type: Q * 3.7 Character Escaping, 1st paragraph, 3rd sentence o "There is also a need, often satisfied by the same or similar mechanisms, to express characters not directly representable in the character encoding of instances of the language." o Why "instances of the language" and not just "the language" ? 26. Type: Q * 3.7 Character Escaping, 1st paragraph, last sentence o " ... a language's syntax, which is itself expressed as characters represented at the character encoding level." o Why is a language's syntax expressed as characters "represented at the character encoding level" and not just as characters in the sense of abstract symbols? 27. Type: Q * 3.7 Character Escaping, 4th [S] requirement, 2nd and last sentences o "Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided. ... Forms like SPREAD's &UABCD; [SPREAD] <http://www.w3.org/TR/2002/WD-charmod-20020430/#spread#spread> or XML's &#xhhhh;, where the character escape is explicitly terminated by a semicolon, are much better." o The examples of good forms ("where the character escape is explicitly terminated by a semicolon") in the last sentence seem to exhibit the characteristics ("where the end is determined by a character outside the set of characters admissible in the character escape itself") of escape syntaxes that SHOULD be avoided. 28. Type: E * 3.7 Character Escaping, 6th [S] requirement, 1st sentence o "[S] Escaped characters SHOULD be acceptable wherever unescaped characters are; ..." o What are "unescaped characters"? Any character not expressed in the escaping mechanism? Seems to say that escaped characters SHOULD be acceptable wherever a character is acceptable (since a character normally is not expressed in the escaping mechanism). o Is this the intent? - "[S] Escaped characters SHOULD be acceptable wherever their unescaped forms are; ..." 29. Type: E * 3.7 Character Escaping, 6th [S] requirement, last sentence o "In particular, escaped characters SHOULD be acceptable in identifiers and comments..." o What if the identifier syntax is defined to be of a set that does not include the character which is escaped? 30. Type: E * 4.2.3 Fully-normalized text, 5th paragraph, last sentence o "Many languages will benefit from defining more boundaries..." o It would be helpful to give examples of the "more boundaries". 31. Type: E * 4.3.1 General Examples, 3rd paragraph, 1st sentence o "The string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E), where U+0327 is the COMBINING CEDILLA, encoded in a Unicode encoding form, is neither ..." o The string ... "is not ..." because there is no 'nor' alternative. 32. Type: S * 4.3.1 General Examples, 5th paragraph, 1st sentence o "...the string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E) which is not include-normalized ('c¸' is replaceable by 'ç')." o Should it be this? - "...the string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E) which is not Unicode-normalized ('c¸' is replaceable by 'ç')." Regards, Yin Leng Husband on behalf of Web Services Architecture WG
Received on Friday, 31 May 2002 01:19:13 UTC