Gentlepeople,
A joint teleconference of the XML Query Working Group and the XSL Working
Group has approved the following as their formal comments on the document
entitled Character Model for the World Wide Web 1.0: Normalization.
(Please note that these comments are substantially the same as the
personal comments that I sent to you in late December, 2005, the
principal change being the addition of an example in point (3)
below.)
(1) In section 2, Conformance, the list of specification conformance
criteria include: "make it a conformance requirement for
implementations to conform to this document", and "make it a
conformance requirement for content to conform to this
document". Would you clarify (perhaps only as a response to
this message) whether or not the XQuery 1.0, XPath 2.0, and XSLT 2.0
suite of specifications would be cited as non-conforming to this
specification if (as I believe to be the case) they do not contain an
explicit statement of those two criteria?
(2) In section 3.2.3, Include-normalized text, bullet 2 uses the phrase
"clause 1 above". I believe that most readers will better
understand your meaning if you replace that with "bullet 1
above" or "list item 1 above". To many readers, the
word "clause" refers either to a major subdivision of a
document (e.g., a chapter) or to a relatively short phrase such as a
portion of a sentence (e.g., the noun clause).
(3) In section 3.2.4, Fully-normalized text, first numbered list, bullet
1 says that a composing character is "the second character in the
canonical decomposition mapping of some character". There are
characters in Unicode that are made of a "base character" plus
two or more composing characters; therefore, "a composing
character" would be "each character after the first in the
canonical decomposition mapping of some character". One
example of such a character would seem to be U+1FA4 GREEK SMALL LETTER
OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI, the canonical decomposition
of which is U+03C9 GREEK SMALL LETTER OMEGA + U+0313 COMBINING COMMA
ABOVE + U+0301 COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK
YPOGEGRAMMENI.
(4) In section 3.2.4, Fully-normalized text, first numbered list, bullet
1 refers to "some character that is not listed in the Composition
Exclusion Table defined in [UTR #15]". However, following the link
to the most recent version of UTR #15, the section of that document whose
title is "Composition Exclusion Table" contains neither a table
nor a list of characters. While this is an apparent failure of UTF
#15, the dependence on that section of UTR #15 cascades that failure into
Normalization. However, there is (in section 6 of UTF #15) a (not
terribly obvious) reference to "the Composition Exclusion Table
[Exclusions]". The References entry with that name
(Exclusions) contains pointers to several versions of such a table, the
latest of which is available at
http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt
. It would have seemed a Very Good Idea for Normalization to point
directly to this file, perhaps in addition to the reference directly to
UTF #16 section 6.
(5) In section 3.2.4, Fully-normalized text, second numbered list, bullet
2 uses the phrase "clause 1 above". I believe that most
readers will better understand your meaning if you replace that with
"bullet 1 above" or "list item 1 above". To
many readers, the word "clause" refers either to a major
subdivision of a document (e.g., a chapter) or to a relatively short
phrase such as a portion of a sentence (e.g., the noun clause).
(6) In section 3.2.4, Fully-normalized text, the paragraph beginning
"Identification of the constructs..." includes the statement
that "it is the responsibility of the specification for a language
to specify exactly what constitutes a relevant construct".
Could you please clarify whether or not the XQuery 1.0, XPath 2.0, and
XSLT 2.0 suite of specifications would be cited as non-conforming to this
specification if (as I believe to be the case) they do not contain any
such explicit specification?
(7) In section 3.2.7, Certified and suspect text, the NOTE begins with
the statement "To normalize text, it is in general sufficient to
store the last seen character...". Perhaps I've missed
something important earlier in this specification, but I have no idea
what that statement means. One way of explaining it is to use the
example of text "C combining-cedilla". When processing
that text, I store the last seen character (combining-cedilla).
And, violá, the text is normalized. But that obviously is not the
case. So what does that statement mean? Could it be expressed
in a less ambiguous manner?
(8) In section 3.4, Responsibility for normalization, item C303 includes
an Example that uses the notations "xf:concat" and
"xf:substring". In both cases (because this document does
not define any namespace prefixes associated with the namespace name
associated with XPath/XQuery functions), the "xf" should be
replaced with "fn", which is the conventional prefix used for
that namespace.
(9) In section 4, String identity matching, item C312, list item 1
includes the statement "In accordance with section
3 Normalization, this step MUST be performed by the
producers of the strings to be compared." But section 3
does not make such a requirement (it did so in earlier drafts, but has
been changed in this draft). At the very least, that use of
"MUST" must (pun intended) be replaced by
"SHOULD". Furthermore, the requirement to use "Early
uniform normalization" might be correct because of the use of
"as if" in the preceding paragraph, but (as section 3 makes
clear) late normalization will produce identical results.
(10) In appendix A, the reference to XQuery Operators includes an
outdated list of editors. Jonathan Robie is no longer cited as an
editor of that specification. Furthermore, the most recent edition
is now dated 4 November, 2005, and is a Candidate Recommendation.
(Of course, because Normalization was published earlier than that date,
you could not have known this fact; the next publication of Normalization
should make this change.)
(11) In Appendix B, the final NOTE: says that certain characters may be
displayed as a blank or as a blank rectangle. In some situations
(e.g., Firefox 1.0.4 on my system without any font that covers Sinhala, a
question mark ("?") is displayed. It might be appropriate
to include that possibility in this NOTE.
Hope this helps,
Jim
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)
Phone: +1.801.942.0144
Co-Chair, W3C XML Query WG; F&O (etc.)
editor Fax : +1.801.942.3345
Oracle Corporation Oracle
Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim
dot melton at acm dot org
Sandy, UT 84093-1063
USA Personal email:
jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are the
opinions =
= only of myself and may or may not reflect the opinions of
anybody =
= else with whom I may or may not have discussed the issues at
hand. =
========================================================================
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-*
(SQL) Phone: +1.801.942.0144
Co-Chair, W3C XML Query WG; F&O (etc.)
editor Fax : +1.801.942.3345
Oracle Corporation Oracle
Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email:
jim dot melton at acm dot org
Sandy, UT 84093-1063
USA Personal email:
jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are
the opinions =
= only of myself and may or may not reflect the opinions of
anybody =
= else with whom I may or may not have discussed the issues at
hand. =
========================================================================