RE: Comments on draft-yergeau-rfc2279bis-00.txt from Francois Yergeau on 2002-10-18 (ietf-charsets@w3.org from October to December 2002)

From: Francois Yergeau <FYergeau@alis.com>
Date: Thu, 17 Oct 2002 21:39:42 -0400
To: charsets <ietf-charsets@iana.org>
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB73660EB2FE@alis-2k.alis.domain>

Markus Scherer wrote:
> Patrik Fältström wrote:
> > What I hear on this list is that the consensus is that BOM 
> SHOULD NOT be 
> > used. I would like it to be MUST NOT be used in Internet protocols, 
> > which leads to tagged UTF-8 text be illegal if the BOM 
> exists in the text.
> 
> That would violate the Unicode standard....

It would also be unenforceable in many cases.  How is an HTTP server
supposed to know that a text file it is reading from disk is UTF-8 and, if
so, that it has an initial BOM?  We know that in practice HTTP servers do
not know most of the time and therefore can't enforce a ban on the BOM.
Deployed HTTP servers have been in violation of the spec w/r to charset for
almost ten years, RFC 2279bis won't change that.

With an outright BOM, we would therefore end up with a worse situation than
selectively allowing the BOM: it would be officially banned but still
widespread, forcing implementations to cope with something that's not
supposed to happen.  Implementations that just go by the book would die in
the market.

That's why I prefer to distinguish cases and let protocols ban the BOM where
they can and allow it otherwise.

-- 
François

Received on Thursday, 17 October 2002 21:41:11 UTC