W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2006

Re: Comments on Character Model for the World Wide Web 1.0: Normalization

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 13 Jan 2006 22:43:06 +0900
To: public-i18n-core@w3.org
Message-ID: <op.s3bkd4mmx1753t@ibm-60d333fc0ec>

On Fri, 13 Jan 2006 21:27:43 +0900, Richard Ishida <ishida@w3.org> wrote:

> Felix,
>
> I don't think it should take long to search for duplicates.  We must  
> also add these comments to the last call table, so that we have a  
> complete record of comments and responses.  This is important for the  
> transition of the document to CR.

Just FYI: Richard and me just talked and decided that I will add the  
comments of the QT people to the last call table, and at the same time we  
will handle them first. It's possible to satisfy everybody :)

Regards, Felix.

> RI
>
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>
>> -----Original Message-----
>> From: public-i18n-core-request@w3.org
>> [mailto:public-i18n-core-request@w3.org] On Behalf Of Felix Sasaki
>> Sent: 12 January 2006 03:12
>> To: public-i18n-core@w3.org
>> Subject: Fwd: Comments on Character Model for the World Wide
>> Web 1.0: Normalization
>>
>> Hi all,
>>
>> This is the approved version of the XQuery / XSL Working
>> Group comments.
>> If you have a look at them, you see that they are very closly
>> related to the progress of the QT specifications. Most of
>> them are now in canidate recommendation stage. If we still
>> want to have an influence on the specs before they become
>> recs, we should reply fast. Hence, I propose to postpone
>> Francois action item to look for dublicates in the previous
>> last call, and talk about the comments during the teleconfs
>> in the next week.
>>
>> Felix
>>
>> ------- Forwarded message -------
>> From: "Jim Melton" <jim.melton@acm.org>
>> To: www-i18n-comments@w3.org
>> Cc: w3c-xsl-query@w3.org, "C. Michael Sperberg-McQueen"
>> <cmsmcq@acm.org>
>> Subject: Comments on Character Model for the World Wide Web 1.0:
>> Normalization
>> Date: Thu, 12 Jan 2006 05:32:12 +0900
>>
>> Gentlepeople,
>>
>> A joint teleconference of the XML Query Working Group and the
>> XSL Working Group has approved the following as their formal
>> comments on the document entitled Character Model for the
>> World Wide Web 1.0:
>> Normalization.  (Please note that these comments are
>> substantially the same as the personal comments that I sent
>> to you in late December, 2005, the principal change being the
>> addition of an example in point (3) below.)
>>
>> (1) In section 2, Conformance, the list of specification
>> conformance criteria include: "make it a conformance
>> requirement for implementations to conform to this document",
>> and "make it a conformance requirement for content to conform
>> to this document".  Would you clarify (perhaps only as a
>> response to this message) whether or not the XQuery 1.0,
>> XPath 2.0, and XSLT 2.0 suite of specifications would be
>> cited as non-conforming to this specification if (as I
>> believe to be the case) they do not contain an explicit
>> statement of those two criteria?
>>
>> (2) In section 3.2.3, Include-normalized text, bullet 2 uses
>> the phrase "clause 1 above".  I believe that most readers
>> will better understand your meaning if you replace that with
>> "bullet 1 above" or "list item 1 above".  To many readers,
>> the word "clause" refers either to a major subdivision of a
>> document (e.g., a chapter) or to a relatively short phrase
>> such as a portion of a sentence (e.g., the noun clause).
>>
>> (3) In section 3.2.4, Fully-normalized text, first numbered
>> list, bullet 1 says that a composing character is "the second
>> character in the canonical decomposition mapping of some
>> character".  There are characters in Unicode that are made of
>> a "base character" plus two or more composing characters;
>> therefore, "a composing character" would be "each character
>> after the first in the canonical decomposition mapping of
>> some character".  One example of such a character would seem
>> to be U+1FA4 GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND
>> YPOGEGRAMMENI, the canonical decomposition of which is
>> U+03C9 GREEK SMALL LETTER OMEGA + U+0313 COMBINING COMMA
>> ABOVE + U+0301
>> COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI.
>>
>> (4) In section 3.2.4, Fully-normalized text, first numbered
>> list, bullet 1 refers to "some character that is not listed
>> in the Composition Exclusion Table defined in [UTR #15]".
>> However, following the link to the most recent version of UTR
>> #15, the section of that document whose title is "Composition
>> Exclusion Table" contains neither a table nor a list of
>> characters.  While this is an apparent failure of UTF #15,
>> the dependence on that section of UTR #15 cascades that
>> failure into Normalization.  However, there is (in section 6
>> of UTF #15) a (not terribly
>> obvious) reference to "the Composition Exclusion Table
>> [Exclusions]".  The References entry with that name
>> (Exclusions) contains pointers to several versions of such a
>> table, the latest of which is available at
>> <http://www.unicode.org/Public/UNIDATA/CompositionExclusions.t
>> xt>http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt
>> .  It would have seemed a Very Good Idea for Normalization to
>> point directly to this file, perhaps in addition to the
>> reference directly to UTF
>> #16 section 6.
>>
>> (5) In section 3.2.4, Fully-normalized text, second numbered
>> list, bullet 2 uses the phrase "clause 1 above".  I believe
>> that most readers will better understand your meaning if you
>> replace that with "bullet 1 above" or "list item 1 above".
>> To many readers, the word "clause" refers either to a major
>> subdivision of a document (e.g., a chapter) or to a
>> relatively short phrase such as a portion of a sentence
>> (e.g., the noun clause).
>>
>> (6) In section 3.2.4, Fully-normalized text, the paragraph
>> beginning "Identification of the constructs..." includes the
>> statement that "it is the responsibility of the specification
>> for a language to specify exactly what constitutes a relevant
>> construct".  Could you please clarify whether or not the
>> XQuery 1.0, XPath 2.0, and XSLT 2.0 suite of specifications
>> would be cited as non-conforming to this specification if (as
>> I believe to be the case) they do not contain any such
>> explicit specification?
>>
>> (7) In section 3.2.7, Certified and suspect text, the NOTE
>> begins with the statement "To normalize text, it is in
>> general sufficient to store the last seen character...".
>> Perhaps I've missed something important earlier in this
>> specification, but I have no idea what that statement means.
>> One way of explaining it is to use the example of text "C
>> combining-cedilla".  When processing that text, I store the
>> last seen character (combining-cedilla).  And, violá, the
>> text is normalized.  But that obviously is not the case.  So
>> what does that statement mean?  Could it be expressed in a
>> less ambiguous manner?
>>
>> (8) In section 3.4, Responsibility for normalization, item
>> C303 includes an Example that uses the notations "xf:concat"
>> and "xf:substring".  In both cases (because this document
>> does not define any namespace prefixes associated with the
>> namespace name associated with XPath/XQuery functions), the
>> "xf" should be replaced with "fn", which is the conventional
>> prefix used for that namespace.
>>
>> (9) In section 4, String identity matching, item C312, list
>> item 1 includes the statement "In accordance with section
>> <http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#sec-Norma
> lization>3
>> Normalization, this step MUST be performed by the producers
>> of the strings to be compared."  But section 3 does not make
>> such a requirement (it did so in earlier drafts, but has been
>> changed in this draft).  At the very least, that use of
>> "MUST" must (pun intended) be replaced by "SHOULD".
>> Furthermore, the requirement to use "Early uniform
>> normalization" might be correct because of the use of "as if"
>> in the preceding paragraph, but (as section 3 makes clear)
>> late normalization will produce identical results.
>>
>> (10) In appendix A, the reference to XQuery Operators
>> includes an outdated list of editors.  Jonathan Robie is no
>> longer cited as an editor of that specification.
>> Furthermore, the most recent edition is now dated 4 November,
>> 2005, and is a Candidate Recommendation.  (Of course, because
>> Normalization was published earlier than that date, you could
>> not have known this fact; the next publication of
>> Normalization should make this
>> change.)
>>
>> (11) In Appendix B, the final NOTE: says that certain
>> characters may be displayed as a blank or as a blank
>> rectangle.  In some situations (e.g., Firefox 1.0.4 on my
>> system without any font that covers Sinhala, a question mark
>> ("?") is displayed.  It might be appropriate to include that
>> possibility in this NOTE.
>>
>>
>> Hope this helps,
>>      Jim
>>
>> ==============================================================
>> ==========
>> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone:
>> +1.801.942.0144
>>     Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax :
>> +1.801.942.3345
>> Oracle Corporation        Oracle Email: jim dot melton at
>> oracle dot com
>> 1930 Viscounti Drive      Standards email: jim dot melton at
>> acm dot org
>> Sandy, UT 84093-1063 USA          Personal email: jim at
>> melton dot name
>> ==============================================================
>> ==========
>> =  Facts are facts.   But any opinions expressed are the
>> opinions      =
>> =  only of myself and may or may not reflect the opinions of
>> anybody   =
>> =  else with whom I may or may not have discussed the issues
>> at hand.  =
>> ==============================================================
>> ==========
>> ==============================================================
>> ==========
>> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone:
>> +1.801.942.0144
>>     Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax :
>> +1.801.942.3345
>> Oracle Corporation        Oracle Email: jim dot melton at
>> oracle dot com
>> 1930 Viscounti Drive      Standards email: jim dot melton at
>> acm dot org
>> Sandy, UT 84093-1063 USA          Personal email: jim at
>> melton dot name
>> ==============================================================
>> ==========
>> =  Facts are facts.   But any opinions expressed are the
>> opinions      =
>> =  only of myself and may or may not reflect the opinions of
>> anybody   =
>> =  else with whom I may or may not have discussed the issues
>> at hand.  =
>> ==============================================================
>> ==========
>>
>>
>
Received on Friday, 13 January 2006 13:43:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:50 GMT