- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 17 Oct 2002 21:39:42 -0400
- To: charsets <ietf-charsets@iana.org>
Markus Scherer wrote: > Patrik Fältström wrote: > > What I hear on this list is that the consensus is that BOM > SHOULD NOT be > > used. I would like it to be MUST NOT be used in Internet protocols, > > which leads to tagged UTF-8 text be illegal if the BOM > exists in the text. > > That would violate the Unicode standard.... It would also be unenforceable in many cases. How is an HTTP server supposed to know that a text file it is reading from disk is UTF-8 and, if so, that it has an initial BOM? We know that in practice HTTP servers do not know most of the time and therefore can't enforce a ban on the BOM. Deployed HTTP servers have been in violation of the spec w/r to charset for almost ten years, RFC 2279bis won't change that. With an outright BOM, we would therefore end up with a worse situation than selectively allowing the BOM: it would be officially banned but still widespread, forcing implementations to cope with something that's not supposed to happen. Implementations that just go by the book would die in the market. That's why I prefer to distinguish cases and let protocols ban the BOM where they can and allow it otherwise. -- François
Received on Thursday, 17 October 2002 21:41:11 UTC