W3C home > Mailing lists > Public > www-international@w3.org > April to June 2001

Re: UTF-8 signature in web and email

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 May 2001 10:55:37 +0900
Message-Id: <>
To: Roozbeh Pournader <roozbeh@sharif.edu>, Unicode List <unicode@unicode.org>, <www-international@w3.org>
Hello Roozbeh

At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:

>Well, I received a UTF-8 email from Microsoft's Dr International today. It
>was a "multipart/alternative", with both the "text/plain" and "text/html"
>in UTF-8. Well, nothing interesting yet, but the interesting point was
>that the HTML version had a UTF-8 signature, but the text version lacked
>it. So, the HTML version had it three times: mime charset as UTF-8,
>UTF-8 signature, and <meta> charset markup.

This is definitely overblown. There is about 5% of a justification
for having a 'signature' on a plain-text, standalone file (the reason
being that it's somewhat easier to detect that the file is UTF-8 from the
signature than to read through the file and check the byte patterns
(which is an extremely good method to distinguish UTF-8 from everything
else)). For self-labeled data (HTML, XML, CSS) and in the context
of MIME (with the charset parameter), an UTF-8 signature doesn't
make sense at all.

>1. What are the current recommendations for these?

- When producing UTF-8 files/documents, *never* produce a 'signature'.
   There are quite some receivers that cannot deal with it, or that deal
   with it by displaying something. And there are many other problems.

- When receiving UTF-8, you probably should check for a 'signature'
   and remove it. There are too many applications that send one out,

>2. Most important of all, does W3C allow UTF-8 signatures before
>"<!DOCTYPE>"? And if yes, what should be done if they mismatch the
>charset as can be described in the <meta> tag?

For text/html, neither the HTML spec nor the IETF definition of UTF-8
(RFC 2279) says anything as far as I know. The reason was that nobody
thought about an UTF-8 signature at that time.

For XML, the 'signature' is now listed in App F.1
But this is not normative, and fairly recent, and so you should never
expect an XML processor to accept it (except as a plain character
in the file when there is no XML declaration).

Regards,   Martin.
Received on Wednesday, 16 May 2001 02:40:58 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:20 UTC