- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 17 Oct 2003 16:09:27 -0400
- To: "Ashok Malhotra" <ashokma@microsoft.com>, <w3c-i18n-ig@w3.org>, <w3c-xml-query-wg@w3.org>
- Cc: <public-qt-comments@w3.org>, msm@w3.org
Hello Ashok, Many thanks to you and others for the clarification. Now that we know that you meant to write >>>> There was a feeling that Unicode 4.0 was too new and as yet not widely implemented. >>>> we can look at the actual issues. The Character Model has a special section on how to reference Unicode/ ISO 10646: http://www.w3.org/TR/charmod/#sec-RefUnicode [note that that document itself unfortunately still references Unicode 3 in the public version, but the editor's version is changed to Unicode 4] The important point from there is the distinction between generic references and specific references. Generic references are appropriate if the reference is to the coded character set, and the idea is that newly encoded additions will be usable, too. My understanding is that this is, and should be, the case for XML Query, with respect to the data it processes. Specific references have to be used if it is important to lock down some things that are known to potentially change, such as some character properties. Examples include properties used for regular expressions in XML Schema, or for XML Query, this could be a specific version of the Unicode collation algorithm (which is actually a separate document and therefore not relevant here) or some specific character properties. For the former, using the latest major version seems to be a very good idea. It is widely available in bookform and on the Web. It would be strange to refer to an older book version; readers of your spec who go out to buy the version 3.0 book and then discover that there was version 4.0 would get angry at you for the money they wasted. With respect to where generic references are appropriate, there is absolutely no problem with referencing different versions in different specs (e.g. Unicode 2.0 in XML 1.0, Unicode 3.0 in XML Schema, Unicode 4.0 in XML Query) because all these references point to exactly the same thing, the open-ended coded character set, and are consistent even though the numbers are not the same. For the later, this has to be examined case-by-case. For example, for regular expressions, XML Schema refers to the Unicode Database version 3.1, although it dogdes complete interoperability by encouraging support for newer versions. More than one reference may be necessary. At 08:28 03/10/16 -0700, Ashok Malhotra wrote: >In their review of the F&O document the I18N WG asked that the references >to Unicode be changed to Unicode 4.0.and to change the reference to case >mapping to Annexure #15 of Unicode 4.0 and the normalization forms defined >in Unicode 4.0. See <file://Also >http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0105.html>Al >so >http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0105.html >and ><http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html>h >ttp://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0106.html > >The XML Schema WG asked that the references to Unicode be consistent. See ><http://lists.w3.org/Archives/Public/public-qt-comments/2003Aug/0003.html>h >ttp://lists.w3.org/Archives/Public/public-qt-comments/2003Aug/0003.html I have looked through this document. I have not found any such request. (I have copied Michael Sperberg-McQueen, maybe he can help) The closest I have found is http://www.w3.org/XML/Group/2003/07/xmlschema-fo-comments.html#d0e327 This refers to the fact that Unicode 2.0 and Unicode 3.0 do not clearly outlaw encoding of non-BMP characters in six bytes (using two surrogate codepoints). If the intent of the XML Schema WG was to request that XML Query should in any way tolerate such encoding, then this would not be appropriate. XML 1.0 references RFC 2279 (see http://www.w3.org/TR/REC-xml#sec-external-ent), which is already clear that non-BMP characters have to be encoded as four bytes. This would also be a bad idea given that there are known security problems connected with overlong UTF-8 byte sequences, >This was discussed in the joint WG meeting in Toronto. There was a >feeling that Unicode was too new and as yet not widely implemented. There >was also the argument for consistency. Thus, it was decided that the F&O >would refer to the XML standard, since that is what the spec is based on, >and be consistent with it. Finally, back to the original statement: >>>> There was a feeling that Unicode 4.0 was too new and as yet not widely implemented. >>>> Some parts of Unicode 2.0 may in some sense not yet be very well implemented (e.g. Indic script rendering in certain environments). On the other hand, some parts of Unicode 4.0 are already implemented, because Unicode 4.0 went through various reviews and a beta period. So the above claim doesn't seem to make much sense to me. Regards, Martin.
Received on Friday, 17 October 2003 16:11:35 UTC