W3C home > Mailing lists > Public > w3c-ietf-xmldsig@w3.org > April to June 2000

RE: Clarify `UTF-8'

From: Martin J. Duerst <duerst@w3.org>
Date: Tue, 20 Jun 2000 13:45:16 +0900
Message-Id: <4.2.0.58.J.20000620133427.035f1e60@sh.w3.mag.keio.ac.jp>
To: "John Boyer" <jboyer@PureEdge.com>, "TAMURA Kent" <kent@trl.ibm.co.jp>, <w3c-ietf-xmldsig@w3.org>
Hello Kent,

I concur with John. Please note that the page you cite says:

Note: The italicized names are not yet registered, but are useful for 
reference.

(and UTF-8N is italicized).

Please also note that with regards to UTF-8 and "A real ZWNBSP at the start of
a file requires a signature first", this is an interpretation of the author
of the page you cite. From RFC 2279 (IETF definition of UTF-8), it is not
even clear whether a signature is allowed or not.

Also, the descriptions are given more in terms of a receiver than in
terms of a producer. As a producer, writing out a BOM/signature is a
bad idea among else because:
- There are XML parsers out there that don't support it.
- A plain ASCII file turns into a non-ASCII file.
- UTF-8 can easily be detected if necessary.

Adding a sentence saying that the UTF-8 produced does not start
with a BOM may be a good idea for a clarification.

Regards,   Martin.

At 00/06/19 20:32 -0700, John Boyer wrote:
>Hi TAMURA-san,
>
>According to the table you cited, I have so far meant UTF-8N in every
>instance where UTF-8 is currently used.
>
>However, I have not before seen any document that prepends an encoding
>signature, nor have I ever seen a reference to UTF-8N.  Is there any support
>other than this table for the UTF-8N nomenclature, or am I just behind the
>curve on this one?
>
>To the contrary, the XML 1.0 specification clearly uses the encoding UTF-8
>(which is the default) to mean UTF-8N by the table you cited.  For example,
>Section 4.3.3 contains the following sentence:
>
>"Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not
>strictly need an encoding declaration."
>
>Thanks,
>***************************************
>John Boyer,
>Software Development Manager
>
>PureEdge Solutions (formerly UWI.Com)
>Creating Binding E-Commerce
>
>v:250-479-8334, ext. 143 f:250-479-3772
>1-888-517-2675  http://www.PureEdge.com
>***************************************
>
>
>
>-----Original Message-----
>From: w3c-ietf-xmldsig-request@w3.org
>[mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of TAMURA Kent
>Sent: Monday, June 19, 2000 5:52 PM
>To: w3c-ietf-xmldsig@w3.org
>Subject: Clarify `UTF-8'
>
>
>XML Signature spec. and Canonical XML spec. refer 'UTF-8' many
>times.  Please clarify which is each UTF-8, 'UTF-8' (with UTF-8
>signature, EF BB BF) or 'UTF-8N' (without UTF-8 siganture).
>
>See Table 2 in
>http://www-4.ibm.com/software/developer/library/utfencodingforms/index.html
>
>--
>TAMURA Kent @ Tokyo Research Laboratory, IBM
Received on Tuesday, 20 June 2000 00:39:24 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.29 : Thursday, 13 January 2005 12:10:09 GMT