- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 16 May 2001 10:55:37 +0900
- To: Roozbeh Pournader <roozbeh@sharif.edu>, Unicode List <unicode@unicode.org>, <www-international@w3.org>
Hello Roozbeh At 04:02 01/05/15 +0430, Roozbeh Pournader wrote: >Well, I received a UTF-8 email from Microsoft's Dr International today. It >was a "multipart/alternative", with both the "text/plain" and "text/html" >in UTF-8. Well, nothing interesting yet, but the interesting point was >that the HTML version had a UTF-8 signature, but the text version lacked >it. So, the HTML version had it three times: mime charset as UTF-8, >UTF-8 signature, and <meta> charset markup. This is definitely overblown. There is about 5% of a justification for having a 'signature' on a plain-text, standalone file (the reason being that it's somewhat easier to detect that the file is UTF-8 from the signature than to read through the file and check the byte patterns (which is an extremely good method to distinguish UTF-8 from everything else)). For self-labeled data (HTML, XML, CSS) and in the context of MIME (with the charset parameter), an UTF-8 signature doesn't make sense at all. >Questions: > >1. What are the current recommendations for these? - When producing UTF-8 files/documents, *never* produce a 'signature'. There are quite some receivers that cannot deal with it, or that deal with it by displaying something. And there are many other problems. - When receiving UTF-8, you probably should check for a 'signature' and remove it. There are too many applications that send one out, unfortunately. >2. Most important of all, does W3C allow UTF-8 signatures before >"<!DOCTYPE>"? And if yes, what should be done if they mismatch the >charset as can be described in the <meta> tag? For text/html, neither the HTML spec nor the IETF definition of UTF-8 (RFC 2279) says anything as far as I know. The reason was that nobody thought about an UTF-8 signature at that time. For XML, the 'signature' is now listed in App F.1 http://www.w3.org/TR/REC-xml#sec-guessing-no-ext-info But this is not normative, and fairly recent, and so you should never expect an XML processor to accept it (except as a plain character in the file when there is no XML declaration). Regards, Martin.
Received on Wednesday, 16 May 2001 02:40:58 UTC