W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2006

Fwd: Comments on Character Model for the World Wide Web 1.0: Normalization

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 12 Jan 2006 12:12:08 +0900
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <op.s28wiiatx1753t@ibm-60d333fc0ec.mag.keio.ac.jp>
Hi all,

This is the approved version of the XQuery / XSL Working Group comments.  
If you have a look at them, you see that they are very closly related to  
the progress of the QT specifications. Most of them are now in canidate  
recommendation stage. If we still want to have an influence on the specs  
before they become recs, we should reply fast. Hence, I propose to  
postpone Francois action item to look for dublicates in the previous last  
call, and talk about the comments during the teleconfs in the next week.

Felix

------- Forwarded message -------
From: "Jim Melton" <jim.melton@acm.org>
To: www-i18n-comments@w3.org
Cc: w3c-xsl-query@w3.org, "C. Michael Sperberg-McQueen" <cmsmcq@acm.org>
Subject: Comments on Character Model for the World Wide Web 1.0:    
Normalization
Date: Thu, 12 Jan 2006 05:32:12 +0900

Gentlepeople,

A joint teleconference of the XML Query Working Group and the XSL Working
Group has approved the following as their formal comments on the document
entitled Character Model for the World Wide Web 1.0:
Normalization.  (Please note that these comments are substantially the same
as the personal comments that I sent to you in late December, 2005, the
principal change being the addition of an example in point (3) below.)

(1) In section 2, Conformance, the list of specification conformance
criteria include: "make it a conformance requirement for implementations to
conform to this document", and "make it a conformance requirement for
content to conform to this document".  Would you clarify (perhaps only as a
response to this message) whether or not the XQuery 1.0, XPath 2.0, and
XSLT 2.0 suite of specifications would be cited as non-conforming to this
specification if (as I believe to be the case) they do not contain an
explicit statement of those two criteria?

(2) In section 3.2.3, Include-normalized text, bullet 2 uses the phrase
"clause 1 above".  I believe that most readers will better understand your
meaning if you replace that with "bullet 1 above" or "list item 1
above".  To many readers, the word "clause" refers either to a major
subdivision of a document (e.g., a chapter) or to a relatively short phrase
such as a portion of a sentence (e.g., the noun clause).

(3) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1
says that a composing character is "the second character in the canonical
decomposition mapping of some character".  There are characters in Unicode
that are made of a "base character" plus two or more composing characters;
therefore, "a composing character" would be "each character after the first
in the canonical decomposition mapping of some character".  One example of
such a character would seem to be U+1FA4 GREEK SMALL LETTER OMEGA WITH
PSILI AND OXIA AND YPOGEGRAMMENI, the canonical decomposition of which is
U+03C9 GREEK SMALL LETTER OMEGA + U+0313 COMBINING COMMA ABOVE + U+0301
COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI.

(4) In section 3.2.4, Fully-normalized text, first numbered list, bullet 1
refers to "some character that is not listed in the Composition Exclusion
Table defined in [UTR #15]". However, following the link to the most recent
version of UTR #15, the section of that document whose title is
"Composition Exclusion Table" contains neither a table nor a list of
characters.  While this is an apparent failure of UTF #15, the dependence
on that section of UTR #15 cascades that failure into
Normalization.  However, there is (in section 6 of UTF #15) a (not terribly
obvious) reference to "the Composition Exclusion Table [Exclusions]".  The
References entry with that name (Exclusions) contains pointers to several
versions of such a table, the latest of which is available at
<http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt>http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt
.  It would have seemed a Very Good Idea for Normalization to point
directly to this file, perhaps in addition to the reference directly to UTF
#16 section 6.

(5) In section 3.2.4, Fully-normalized text, second numbered list, bullet 2
uses the phrase "clause 1 above".  I believe that most readers will better
understand your meaning if you replace that with "bullet 1 above" or "list
item 1 above".  To many readers, the word "clause" refers either to a major
subdivision of a document (e.g., a chapter) or to a relatively short phrase
such as a portion of a sentence (e.g., the noun clause).

(6) In section 3.2.4, Fully-normalized text, the paragraph beginning
"Identification of the constructs..." includes the statement that "it is
the responsibility of the specification for a language to specify exactly
what constitutes a relevant construct".  Could you please clarify whether
or not the XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications
would be cited as non-conforming to this specification if (as I believe to
be the case) they do not contain any such explicit specification?

(7) In section 3.2.7, Certified and suspect text, the NOTE begins with the
statement "To normalize text, it is in general sufficient to store the last
seen character...".  Perhaps I've missed something important earlier in
this specification, but I have no idea what that statement means.  One way
of explaining it is to use the example of text "C combining-cedilla".  When
processing that text, I store the last seen character
(combining-cedilla).  And, violá, the text is normalized.  But that
obviously is not the case.  So what does that statement mean?  Could it be
expressed in a less ambiguous manner?

(8) In section 3.4, Responsibility for normalization, item C303 includes an
Example that uses the notations "xf:concat" and "xf:substring".  In both
cases (because this document does not define any namespace prefixes
associated with the namespace name associated with XPath/XQuery functions),
the "xf" should be replaced with "fn", which is the conventional prefix
used for that namespace.

(9) In section 4, String identity matching, item C312, list item 1 includes
the statement "In accordance with section
<http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#sec-Normalization>3
Normalization, this step MUST be performed by the producers of the strings
to be compared."  But section 3 does not make such a requirement (it did so
in earlier drafts, but has been changed in this draft).  At the very least,
that use of "MUST" must (pun intended) be replaced by
"SHOULD".  Furthermore, the requirement to use "Early uniform
normalization" might be correct because of the use of "as if" in the
preceding paragraph, but (as section 3 makes clear) late normalization will
produce identical results.

(10) In appendix A, the reference to XQuery Operators includes an outdated
list of editors.  Jonathan Robie is no longer cited as an editor of that
specification.  Furthermore, the most recent edition is now dated 4
November, 2005, and is a Candidate Recommendation.  (Of course, because
Normalization was published earlier than that date, you could not have
known this fact; the next publication of Normalization should make this
change.)

(11) In Appendix B, the final NOTE: says that certain characters may be
displayed as a blank or as a blank rectangle.  In some situations (e.g.,
Firefox 1.0.4 on my system without any font that covers Sinhala, a question
mark ("?") is displayed.  It might be appropriate to include that
possibility in this NOTE.


Hope this helps,
     Jim

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
    Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
    Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================




Received on Thursday, 12 January 2006 03:12:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:50 GMT