- From: Shigemichi Yazawa <yazawa@globalsight.com>
- Date: Tue, 12 Mar 2002 14:27:14 -0700
- To: www-i18n-comments@w3.org
I reviewed WD-charmod-20020220 and would like to post some comments. 3.5 Reference Processing Model - "... Unicode code points from U+0 to U+0FFFF inclusive; ..." U+0FFFF is typo for U+10FFFF - In the first Note in this section, it says "All specifications that derive from the XML 1.0 specification [XML 1.0] automatically inherit this Reference Processing Model." But XML 1.0 is not very good example because it doesn't allow the use of the full range of Unicode code points and it doesn't justify the exceptions. 3.6.1 Mandating a unique character encoding - "There is also no ambiguity if data is transferred non-electronically and later has to be converted back to a digital representation." If "transferred non-electronically" means that characters are written on paper, there are a lot of ambiguity to determine characters from glyph, like if this space is SPACE U+0020 or NO-BREAK SPACE U+00A0. 3.6.2 Character Encoding Identification - In the fourth Note, there is a type "identifers". - "[S] Specifications MUST NOT use heuristics to determine the encoding of data." In what situation, would specifications "determine" the encoding of data? 3.7 Character Escaping - In the first paragraph, two terms "character data" and "text data" appear, which seem to mean the same thing. It would be better to use either one of the term consistently. - "[S] Explicit end delimiters MUST be provided. Escapes such as \uABCD where the end delimiter is a space or any character other than [01-9A-F] SHOULD be avoided." MUST and SHOULD are mixed here. If the first requirement is MUST, the second must be also MUST. 4.3 Responsibility for Normalization - "[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first successfully validated the text for normalization, and MUST NOT normalize the suspect text." I understand that some application such as XML processor MUST NOT normalize the suspect text because the normalization can turn a well-formed document to ill-formed. On the other hand, some application such as search engine SHOULD normalize text so that it can find canonically equivalent text. ------------------- Shigemichi Yazawa yazawa@globalsight.com
Received on Tuesday, 12 March 2002 16:25:43 UTC