Re: Unicode references from Martin Duerst on 2003-10-19 (public-qt-comments@w3.org from October 2003)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 19 Oct 2003 10:26:19 -0400
To: "Ashok Malhotra" <ashokma@microsoft.com>, <w3c-i18n-ig@w3.org>, <w3c-xml-query-wg@w3.org>
Cc: <public-qt-comments@w3.org>, msm@w3.org
Message-Id: <4.2.0.58.J.20031019102221.0272fd60@localhost>

Some additions:

At 16:09 03/10/17 -0400, Martin Duerst wrote:

>At 08:28 03/10/16 -0700, Ashok Malhotra wrote:

>>The XML Schema WG asked that the references to Unicode be consistent. See 
>><http://lists.w3.org/Archives/Public/public-qt-comments/2003Aug/0003.html> 
>>h ttp://lists.w3.org/Archives/Public/public-qt-comments/2003Aug/0003.html
>
>I have looked through this document. I have not found any such request.
>(I have copied Michael Sperberg-McQueen, maybe he can help)
>The closest I have found is
>http://www.w3.org/XML/Group/2003/07/xmlschema-fo-comments.html#d0e327
>This refers to the fact that Unicode 2.0 and Unicode 3.0 do not
>clearly outlaw encoding of non-BMP characters in six bytes (using
>two surrogate codepoints).
>
>If the intent of the XML Schema WG was to request that XML Query
>should in any way tolerate such encoding, then this would not
>be appropriate. XML 1.0 references RFC 2279 (see
>http://www.w3.org/TR/REC-xml#sec-external-ent), which is already
>clear that non-BMP characters have to be encoded as four bytes.
>This would also be a bad idea given that there are known security
>problems connected with overlong UTF-8 byte sequences,

Please note the erratum
http://www.w3.org/XML/xml-V10-2e-errata#E27, which also makes
clear that XML doesn't allow overlong/irregular encodings.


Regards,   Martin.

Received on Sunday, 19 October 2003 10:42:30 UTC