Gentlepeople,

A joint teleconference of the XML Query Working Group and the XSL Working Group has approved the following as their formal comments on the document entitled Character Model for the World Wide Web 1.0: Normalization.  (Please note that these comments are substantially the same as the personal comments that I sent to you in late December, 2005, the principal change being the addition of an example in point (3) below.)

(1) In section 2, Conformance, the list of specification conformance criteria include: "make it a conformance requirement for implementations to conform to this document", and "make it a conformance requirement for content to conform to this document".  Would you clarify (perhaps only as a response to this message) whether or not the XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications would be cited as non-conforming to this specification if (as I believe to be the case) they do not contain an explicit statement of those two criteria?

(2) In section 3.2.3, Include-normalized text, bullet 2 uses the phrase "clause 1 above".  I believe that most readers will better understand your meaning if you replace that with "bullet 1 above" or "list item 1 above".  To many readers, the word "clause" refers either to a major subdivision of a document (e.g., a chapter) or to a relatively short phrase such as a portion of a sentence (e.g., the noun clause).

(3) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1 says that a composing character is "the second character in the canonical decomposition mapping of some character".  There are characters in Unicode that are made of a "base character" plus two or more composing characters; therefore, "a composing character" would be "each character after the first in the canonical decomposition mapping of some character".  One example of such a character would seem to be U+1FA4 GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI, the canonical decomposition of which is U+03C9 GREEK SMALL LETTER OMEGA + U+0313 COMBINING COMMA ABOVE + U+0301 COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI.

(4) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1 refers to "some character that is not listed in the Composition Exclusion Table defined in [UTR #15]". However, following the link to the most recent version of UTR #15, the section of that document whose title is "Composition Exclusion Table" contains neither a table nor a list of characters.  While this is an apparent failure of UTF #15, the dependence on that section of UTR #15 cascades that failure into Normalization.  However, there is (in section 6 of UTF #15) a (not terribly obvious) reference to "the Composition Exclusion Table [Exclusions]".  The References entry with that name (Exclusions) contains pointers to several versions of such a table, the latest of which is available at http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt .  It would have seemed a Very Good Idea for Normalization to point directly to this file, perhaps in addition to the reference directly to UTF #16 section 6.

(5) In section 3.2.4, Fully-normalized text, second numbered list, bullet 2 uses the phrase "clause 1 above".  I believe that most readers will better understand your meaning if you replace that with "bullet 1 above" or "list item 1 above".  To many readers, the word "clause" refers either to a major subdivision of a document (e.g., a chapter) or to a relatively short phrase such as a portion of a sentence (e.g., the noun clause).

(6) In section 3.2.4, Fully-normalized text, the paragraph beginning "Identification of the constructs..." includes the statement that "it is the responsibility of the specification for a language to specify exactly what constitutes a relevant construct".  Could you please clarify whether or not the XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications would be cited as non-conforming to this specification if (as I believe to be the case) they do not contain any such explicit specification?

(7) In section 3.2.7, Certified and suspect text, the NOTE begins with the statement "To normalize text, it is in general sufficient to store the last seen character...".  Perhaps I've missed something important earlier in this specification, but I have no idea what that statement means.  One way of explaining it is to use the example of text "C combining-cedilla".  When processing that text, I store the last seen character (combining-cedilla).  And, violá, the text is normalized.  But that obviously is not the case.  So what does that statement mean?  Could it be expressed in a less ambiguous manner?

(8) In section 3.4, Responsibility for normalization, item C303 includes an Example that uses the notations "xf:concat" and "xf:substring".  In both cases (because this document does not define any namespace prefixes associated with the namespace name associated with XPath/XQuery functions), the "xf" should be replaced with "fn", which is the conventional prefix used for that namespace.

(9) In section 4, String identity matching, item C312, list item 1 includes the statement "In accordance with section 3 Normalization, this step MUST be performed by the producers of the strings to be compared."  But section 3 does not make such a requirement (it did so in earlier drafts, but has been changed in this draft).  At the very least, that use of "MUST" must (pun intended) be replaced by "SHOULD".  Furthermore, the requirement to use "Early uniform normalization" might be correct because of the use of "as if" in the preceding paragraph, but (as section 3 makes clear) late normalization will produce identical results.

(10) In appendix A, the reference to XQuery Operators includes an outdated list of editors.  Jonathan Robie is no longer cited as an editor of that specification.  Furthermore, the most recent edition is now dated 4 November, 2005, and is a Candidate Recommendation.  (Of course, because Normalization was published earlier than that date, you could not have known this fact; the next publication of Normalization should make this change.)

(11) In Appendix B, the final NOTE: says that certain characters may be displayed as a blank or as a blank rectangle.  In some situations (e.g., Firefox 1.0.4 on my system without any font that covers Sinhala, a question mark ("?") is displayed.  It might be appropriate to include that possibility in this NOTE.


Hope this helps,
   Jim

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
  Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
  Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================