Re: UTF-8 and BOM from Martin J. Duerst on 2000-08-23 (w3c-ietf-xmldsig@w3.org from July to September 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 23 Aug 2000 11:54:07 +0900
To: tgindin@us.ibm.com, "Joseph M. Reagle Jr." <reagle@w3.org>
Cc: "John Boyer" <jboyer@PureEdge.com>, "XML DSig" <w3c-ietf-xmldsig@w3.org>
Message-Id: <4.2.0.58.J.20000823115209.032804c0@sh.w3.mag.keio.ac.jp>

At 00/08/22 17:41 -0400, tgindin@us.ibm.com wrote:
>      Why do we warn people about BOM but not about surrogates, anyway?  One
>is no more appropriate than the other in canonicalized UTF-8.

The difference is that surrogate pairs are explicitly disallowed
by the relevant specs (ISO 10646, Unicode, RFC 2379), but the BOM
issue is not mentioned in RFC 2379 and is as far as I remember
explicitly allowed in ISO 10646 and Unicode.

Regards,  Martin.

Received on Tuesday, 22 August 2000 22:52:32 UTC