RE: Comments on Character Model for the World Wide Web 1.0: Normalization

Felix,

I don't think it should take long to search for duplicates.  We must also add these comments to the last call table, so that we have a complete record of comments and responses.  This is important for the transition of the document to CR.

RI


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 

> -----Original Message-----
> From: public-i18n-core-request@w3.org 
> [mailto:public-i18n-core-request@w3.org] On Behalf Of Felix Sasaki
> Sent: 12 January 2006 03:12
> To: public-i18n-core@w3.org
> Subject: Fwd: Comments on Character Model for the World Wide 
> Web 1.0: Normalization
> 
> Hi all,
> 
> This is the approved version of the XQuery / XSL Working 
> Group comments.  
> If you have a look at them, you see that they are very closly 
> related to the progress of the QT specifications. Most of 
> them are now in canidate recommendation stage. If we still 
> want to have an influence on the specs before they become 
> recs, we should reply fast. Hence, I propose to postpone 
> Francois action item to look for dublicates in the previous 
> last call, and talk about the comments during the teleconfs 
> in the next week.
> 
> Felix
> 
> ------- Forwarded message -------
> From: "Jim Melton" <jim.melton@acm.org>
> To: www-i18n-comments@w3.org
> Cc: w3c-xsl-query@w3.org, "C. Michael Sperberg-McQueen" 
> <cmsmcq@acm.org>
> Subject: Comments on Character Model for the World Wide Web 1.0:    
> Normalization
> Date: Thu, 12 Jan 2006 05:32:12 +0900
> 
> Gentlepeople,
> 
> A joint teleconference of the XML Query Working Group and the 
> XSL Working Group has approved the following as their formal 
> comments on the document entitled Character Model for the 
> World Wide Web 1.0:
> Normalization.  (Please note that these comments are 
> substantially the same as the personal comments that I sent 
> to you in late December, 2005, the principal change being the 
> addition of an example in point (3) below.)
> 
> (1) In section 2, Conformance, the list of specification 
> conformance criteria include: "make it a conformance 
> requirement for implementations to conform to this document", 
> and "make it a conformance requirement for content to conform 
> to this document".  Would you clarify (perhaps only as a 
> response to this message) whether or not the XQuery 1.0, 
> XPath 2.0, and XSLT 2.0 suite of specifications would be 
> cited as non-conforming to this specification if (as I 
> believe to be the case) they do not contain an explicit 
> statement of those two criteria?
> 
> (2) In section 3.2.3, Include-normalized text, bullet 2 uses 
> the phrase "clause 1 above".  I believe that most readers 
> will better understand your meaning if you replace that with 
> "bullet 1 above" or "list item 1 above".  To many readers, 
> the word "clause" refers either to a major subdivision of a 
> document (e.g., a chapter) or to a relatively short phrase 
> such as a portion of a sentence (e.g., the noun clause).
> 
> (3) In section 3.2.4, Fully-normalized text, first numbered 
> list, bullet 1 says that a composing character is "the second 
> character in the canonical decomposition mapping of some 
> character".  There are characters in Unicode that are made of 
> a "base character" plus two or more composing characters; 
> therefore, "a composing character" would be "each character 
> after the first in the canonical decomposition mapping of 
> some character".  One example of such a character would seem 
> to be U+1FA4 GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND 
> YPOGEGRAMMENI, the canonical decomposition of which is
> U+03C9 GREEK SMALL LETTER OMEGA + U+0313 COMBINING COMMA 
> ABOVE + U+0301
> COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI.
> 
> (4) In section 3.2.4, Fully-normalized text, first numbered 
> list, bullet 1 refers to "some character that is not listed 
> in the Composition Exclusion Table defined in [UTR #15]". 
> However, following the link to the most recent version of UTR 
> #15, the section of that document whose title is "Composition 
> Exclusion Table" contains neither a table nor a list of 
> characters.  While this is an apparent failure of UTF #15, 
> the dependence on that section of UTR #15 cascades that 
> failure into Normalization.  However, there is (in section 6 
> of UTF #15) a (not terribly
> obvious) reference to "the Composition Exclusion Table 
> [Exclusions]".  The References entry with that name 
> (Exclusions) contains pointers to several versions of such a 
> table, the latest of which is available at 
> <http://www.unicode.org/Public/UNIDATA/CompositionExclusions.t
> xt>http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt
> .  It would have seemed a Very Good Idea for Normalization to 
> point directly to this file, perhaps in addition to the 
> reference directly to UTF
> #16 section 6.
> 
> (5) In section 3.2.4, Fully-normalized text, second numbered 
> list, bullet 2 uses the phrase "clause 1 above".  I believe 
> that most readers will better understand your meaning if you 
> replace that with "bullet 1 above" or "list item 1 above".  
> To many readers, the word "clause" refers either to a major 
> subdivision of a document (e.g., a chapter) or to a 
> relatively short phrase such as a portion of a sentence 
> (e.g., the noun clause).
> 
> (6) In section 3.2.4, Fully-normalized text, the paragraph 
> beginning "Identification of the constructs..." includes the 
> statement that "it is the responsibility of the specification 
> for a language to specify exactly what constitutes a relevant 
> construct".  Could you please clarify whether or not the 
> XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications 
> would be cited as non-conforming to this specification if (as 
> I believe to be the case) they do not contain any such 
> explicit specification?
> 
> (7) In section 3.2.7, Certified and suspect text, the NOTE 
> begins with the statement "To normalize text, it is in 
> general sufficient to store the last seen character...".  
> Perhaps I've missed something important earlier in this 
> specification, but I have no idea what that statement means.  
> One way of explaining it is to use the example of text "C 
> combining-cedilla".  When processing that text, I store the 
> last seen character (combining-cedilla).  And, violá, the 
> text is normalized.  But that obviously is not the case.  So 
> what does that statement mean?  Could it be expressed in a 
> less ambiguous manner?
> 
> (8) In section 3.4, Responsibility for normalization, item 
> C303 includes an Example that uses the notations "xf:concat" 
> and "xf:substring".  In both cases (because this document 
> does not define any namespace prefixes associated with the 
> namespace name associated with XPath/XQuery functions), the 
> "xf" should be replaced with "fn", which is the conventional 
> prefix used for that namespace.
> 
> (9) In section 4, String identity matching, item C312, list 
> item 1 includes the statement "In accordance with section
> <http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#sec-Norma
lization>3
> Normalization, this step MUST be performed by the producers 
> of the strings to be compared."  But section 3 does not make 
> such a requirement (it did so in earlier drafts, but has been 
> changed in this draft).  At the very least, that use of 
> "MUST" must (pun intended) be replaced by "SHOULD".  
> Furthermore, the requirement to use "Early uniform 
> normalization" might be correct because of the use of "as if" 
> in the preceding paragraph, but (as section 3 makes clear) 
> late normalization will produce identical results.
> 
> (10) In appendix A, the reference to XQuery Operators 
> includes an outdated list of editors.  Jonathan Robie is no 
> longer cited as an editor of that specification.  
> Furthermore, the most recent edition is now dated 4 November, 
> 2005, and is a Candidate Recommendation.  (Of course, because 
> Normalization was published earlier than that date, you could 
> not have known this fact; the next publication of 
> Normalization should make this
> change.)
> 
> (11) In Appendix B, the final NOTE: says that certain 
> characters may be displayed as a blank or as a blank 
> rectangle.  In some situations (e.g., Firefox 1.0.4 on my 
> system without any font that covers Sinhala, a question mark 
> ("?") is displayed.  It might be appropriate to include that 
> possibility in this NOTE.
> 
> 
> Hope this helps,
>      Jim
> 
> ==============================================================
> ==========
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: 
> +1.801.942.0144
>     Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : 
> +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at 
> oracle dot com
> 1930 Viscounti Drive      Standards email: jim dot melton at 
> acm dot org
> Sandy, UT 84093-1063 USA          Personal email: jim at 
> melton dot name
> ==============================================================
> ==========
> =  Facts are facts.   But any opinions expressed are the 
> opinions      =
> =  only of myself and may or may not reflect the opinions of 
> anybody   =
> =  else with whom I may or may not have discussed the issues 
> at hand.  = 
> ==============================================================
> ==========
> ==============================================================
> ==========
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: 
> +1.801.942.0144
>     Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : 
> +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at 
> oracle dot com
> 1930 Viscounti Drive      Standards email: jim dot melton at 
> acm dot org
> Sandy, UT 84093-1063 USA          Personal email: jim at 
> melton dot name
> ==============================================================
> ==========
> =  Facts are facts.   But any opinions expressed are the 
> opinions      =
> =  only of myself and may or may not reflect the opinions of 
> anybody   =
> =  else with whom I may or may not have discussed the issues 
> at hand.  = 
> ==============================================================
> ==========
> 
> 

Received on Friday, 13 January 2006 12:27:59 UTC