- From: Jim Melton <jim.melton@acm.org>
- Date: Tue, 20 Dec 2005 16:57:44 -0700
- To: www-i18n-comments@w3.org
- Cc: jim.melton@acm.org, w3c-xsl-query@w3.org
- Message-Id: <6.2.0.14.2.20051220152420.02b31388@rgmstimap.oraclecorp.com>
Gentlepeople,
I have found a few cycles to review the Working Draft of the Character
Model for the World Wide Web 1.0: Normalization, (hereinafter
"Normalization") dated 27 October, 2005. These comments are personal, and
do not necessarily represent the opinions of the XML Query Working Group,
the XSL Working Group, or Oracle Corp. If some or all of these comments
are endorsed by any of those organizations, then you will receive them
separately as comments from the appropriate organization.
(1) In section 2, Conformance, the list of specification conformance
criteria include: "make it a conformance requirement for implementations to
conform to this document", and "make it a conformance requirement for
content to conform to this document". Would you clarify (perhaps only as a
response to this message) whether or not the XQuery 1.0, XPath 2.0, and
XSLT 2.0 suite of specifications would be cited as non-conforming to this
specification if (as I believe to be the case) they do not contain an
explicit statement of those two criteria?
(2) In section 3.2.3, Include-normalized text, bullet 2 uses the phrase
"clause 1 above". I believe that most readers will better understand your
meaning if you replace that with "bullet 1 above" or "list item 1
above". To many readers, the word "clause" refers either to a major
subdivision of a document (e.g., a chapter) or to a relatively short phrase
such as a portion of a sentence (e.g., the noun clause).
(3) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1
says that a composing character is "the second character in the canonical
decomposition mapping of some character". If there are characters in
Unicode that are made of a "base character" plus two or more composing
characters (I cannot claim to be positive that such characters exist, but I
think that Hangul characters are often decomposed into three or more Jamo;
there may be other examples), then surely "a composing character" would be
"each character after the first in the canonical decomposition mapping of
some character".
(4) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1
refers to "some character that is not listed in the Composition Exclusion
Table defined in [UTR #15]". However, following the link to the most recent
version of UTR #15, the section of that document whose title is
"Composition Exclusion Table" contains neither a table nor a list of
characters. While this is an apparent failure of UTF #15, the dependence
on that section of UTR #15 cascades that failure into
Normalization. However, there is (in section 6 of UTF #15) a (not terribly
obvious) reference to "the Composition Exclusion Table [Exclusions]". The
References entry with that name (Exclusions) contains pointers to several
versions of such a table, the latest of which is available at
<http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt>http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt.
It would have seemed a Very Good Idea for Normalization to point directly
to this file, perhaps in addition to the reference directly to UTF #16
section 6.
(5) In section 3.2.4, Fully-normalized text, second numbered list, bullet 2
uses the phrase "clause 1 above". I believe that most readers will better
understand your meaning if you replace that with "bullet 1 above" or "list
item 1 above". To many readers, the word "clause" refers either to a major
subdivision of a document (e.g., a chapter) or to a relatively short phrase
such as a portion of a sentence (e.g., the noun clause).
(6) In section 3.2.4, Fully-normalized text, the paragraph beginning
"Identification of the constructs..." includes the statement that "it is
the responsibility of the specification for a language to specify exactly
what constitutes a relevant construct". Could you please clarify whether
or not the XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications
would be cited as non-conforming to this specification if (as I believe to
be the case) they do not contain any such explicit specification?
(7) In section 3.2.7, Certified and suspect text, the NOTE begins with the
statement "To normalize text, it is in general sufficient to store the last
seen character...". Perhaps I've missed something important earlier in
this specification, but I have no idea what that statement means. One way
of explaining it is to use the example of text "C combining-cedilla". When
processing that text, I store the last seen character
(combining-cedilla). And, violá, the text is normalized. But that
obviously is not the case. So what does that statement mean? Could it be
expressed in a less ambiguous manner?
(8) In section 3.4, Responsibility for normalization, item C303 includes an
Example that uses the notations "xf:concat" and "xf:substring". In both
cases (because this document does not define any namespace prefixes
associated with the namespace name associated with XPath/XQuery functions),
the "xf" should be replaced with "fn", which is the conventional prefix
used for that namespace.
(9) In section 4, String identity matching, item C312, list item 1 includes
the statement "In accordance with section
<http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#sec-Normalization>3
Normalization, this step MUST be performed by the producers of the strings
to be compared." But section 3 does not make such a requirement (it did so
in earlier drafts, but has been changed in this draft). At the very least,
that use of "MUST" must (pun intended) be replaced by
"SHOULD". Furthermore, the requirement to use "Early uniform
normalization" might be correct because of the use of "as if" in the
preceding paragraph, but (as section 3 makes clear) late normalization will
produce identical results.
(10) In appendix A, the reference to XQuery Operators includes an outdated
list of editors. Jonathan Robie is no longer cited as an editor of that
specification. Furthermore, the most recent edition is now dated 4
November, 2005, and is a Candidate Recommendation. (Of course, because
Normalization was published earlier than that date, you could not have
known this fact; the next publication of Normalization should make this
change.)
(11) In Appendix B, the final NOTE: says that certain characters may be
displayed as a blank or as a blank rectangle. In some situations (e.g.,
Firefox 1.0.4 on my system without any font that covers Sinhala, a question
mark ("?") is displayed. It might be appropriate to include that
possibility in this NOTE.
Hope this helps,
Jim
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
Co-Chair, W3C XML Query WG; F&O (etc.) editor Fax : +1.801.942.3345
Oracle Corporation Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA Personal email: jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================
Received on Tuesday, 20 December 2005 23:58:41 UTC