Re: Comments on Character Model for the World Wide Web 1.0: Normalization

Jim,

Thank you very much for your comments. The i18n core wg is already in  
"holiday", but we will discuss your comments at the beginning of next year.

Have a merry christmas and a good new year.

Regards, Felix.

On Wed, 21 Dec 2005 08:57:44 +0900, Jim Melton <jim.melton@acm.org> wrote:

> Gentlepeople,
>
> I have found a few cycles to review the Working Draft of the Character
> Model for the World Wide Web 1.0: Normalization, (hereinafter
> "Normalization") dated 27 October, 2005.  These comments are personal,  
> and
> do not necessarily represent the opinions of the XML Query Working Group,
> the XSL Working Group, or Oracle Corp.  If some or all of these comments
> are endorsed by any of those organizations, then you will receive them
> separately as comments from the appropriate organization.
>
> (1) In section 2, Conformance, the list of specification conformance
> criteria include: "make it a conformance requirement for implementations  
> to
> conform to this document", and "make it a conformance requirement for
> content to conform to this document".  Would you clarify (perhaps only  
> as a
> response to this message) whether or not the XQuery 1.0, XPath 2.0, and
> XSLT 2.0 suite of specifications would be cited as non-conforming to this
> specification if (as I believe to be the case) they do not contain an
> explicit statement of those two criteria?
>
> (2) In section 3.2.3, Include-normalized text, bullet 2 uses the phrase
> "clause 1 above".  I believe that most readers will better understand  
> your
> meaning if you replace that with "bullet 1 above" or "list item 1
> above".  To many readers, the word "clause" refers either to a major
> subdivision of a document (e.g., a chapter) or to a relatively short  
> phrase
> such as a portion of a sentence (e.g., the noun clause).
>
> (3) In section 3.2.4, Fully-normalized text, first numbered list, bullet  
> 1
> says that a composing character is "the second character in the canonical
> decomposition mapping of some character".  If there are characters in
> Unicode that are made of a "base character" plus two or more composing
> characters (I cannot claim to be positive that such characters exist,  
> but I
> think that Hangul characters are often decomposed into three or more  
> Jamo;
> there may be other examples), then surely "a composing character" would  
> be
> "each character after the first in the canonical decomposition mapping of
> some character".
>
> (4) In section 3.2.4, Fully-normalized text, first numbered list, bullet  
> 1
> refers to "some character that is not listed in the Composition Exclusion
> Table defined in [UTR #15]". However, following the link to the most  
> recent
> version of UTR #15, the section of that document whose title is
> "Composition Exclusion Table" contains neither a table nor a list of
> characters.  While this is an apparent failure of UTF #15, the dependence
> on that section of UTR #15 cascades that failure into
> Normalization.  However, there is (in section 6 of UTF #15) a (not  
> terribly
> obvious) reference to "the Composition Exclusion Table [Exclusions]".   
> The
> References entry with that name (Exclusions) contains pointers to several
> versions of such a table, the latest of which is available at
> <http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt>http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt.
> It would have seemed a Very Good Idea for Normalization to point directly
> to this file, perhaps in addition to the reference directly to UTF #16
> section 6.
>
> (5) In section 3.2.4, Fully-normalized text, second numbered list,  
> bullet 2
> uses the phrase "clause 1 above".  I believe that most readers will  
> better
> understand your meaning if you replace that with "bullet 1 above" or  
> "list
> item 1 above".  To many readers, the word "clause" refers either to a  
> major
> subdivision of a document (e.g., a chapter) or to a relatively short  
> phrase
> such as a portion of a sentence (e.g., the noun clause).
>
> (6) In section 3.2.4, Fully-normalized text, the paragraph beginning
> "Identification of the constructs..." includes the statement that "it is
> the responsibility of the specification for a language to specify exactly
> what constitutes a relevant construct".  Could you please clarify whether
> or not the XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications
> would be cited as non-conforming to this specification if (as I believe  
> to
> be the case) they do not contain any such explicit specification?
>
> (7) In section 3.2.7, Certified and suspect text, the NOTE begins with  
> the
> statement "To normalize text, it is in general sufficient to store the  
> last
> seen character...".  Perhaps I've missed something important earlier in
> this specification, but I have no idea what that statement means.  One  
> way
> of explaining it is to use the example of text "C combining-cedilla".   
> When
> processing that text, I store the last seen character
> (combining-cedilla).  And, violá, the text is normalized.  But that
> obviously is not the case.  So what does that statement mean?  Could it  
> be
> expressed in a less ambiguous manner?
>
> (8) In section 3.4, Responsibility for normalization, item C303 includes  
> an
> Example that uses the notations "xf:concat" and "xf:substring".  In both
> cases (because this document does not define any namespace prefixes
> associated with the namespace name associated with XPath/XQuery  
> functions),
> the "xf" should be replaced with "fn", which is the conventional prefix
> used for that namespace.
>
> (9) In section 4, String identity matching, item C312, list item 1  
> includes
> the statement "In accordance with section
> <http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#sec-Normalization>3
> Normalization, this step MUST be performed by the producers of the  
> strings
> to be compared."  But section 3 does not make such a requirement (it did  
> so
> in earlier drafts, but has been changed in this draft).  At the very  
> least,
> that use of "MUST" must (pun intended) be replaced by
> "SHOULD".  Furthermore, the requirement to use "Early uniform
> normalization" might be correct because of the use of "as if" in the
> preceding paragraph, but (as section 3 makes clear) late normalization  
> will
> produce identical results.
>
> (10) In appendix A, the reference to XQuery Operators includes an  
> outdated
> list of editors.  Jonathan Robie is no longer cited as an editor of that
> specification.  Furthermore, the most recent edition is now dated 4
> November, 2005, and is a Candidate Recommendation.  (Of course, because
> Normalization was published earlier than that date, you could not have
> known this fact; the next publication of Normalization should make this
> change.)
>
> (11) In Appendix B, the final NOTE: says that certain characters may be
> displayed as a blank or as a blank rectangle.  In some situations (e.g.,
> Firefox 1.0.4 on my system without any font that covers Sinhala, a  
> question
> mark ("?") is displayed.  It might be appropriate to include that
> possibility in this NOTE.
>
>
> Hope this helps,
>     Jim
>
> ========================================================================
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
>    Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
> 1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
> Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
> ========================================================================
> =  Facts are facts.   But any opinions expressed are the opinions      =
> =  only of myself and may or may not reflect the opinions of anybody   =
> =  else with whom I may or may not have discussed the issues at hand.  =
> ========================================================================

Received on Wednesday, 21 December 2005 06:02:36 UTC