RE: Comments on draft-yergeau-rfc2279bis-00.txt from Francois Yergeau on 2002-10-03 (ietf-charsets@w3.org from October to December 2002)

From: Francois Yergeau <FYergeau@alis.com>
Date: Thu, 03 Oct 2002 14:11:39 -0400
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB736680CA43@alis-2k.alis.domain>

Martin Duerst wrote:
> Therefore, senders SHOULD NOT use the BOM in larger, usually
> labeled, pieces of text (e.g. MIME entities), and MUST NOT
> use it in smaller protocol elements (usually with a fixed
> encoding). Receivers SHOULD recognize and remove the BOM
> in larger, usually labeled, pieces of text (e.g. MIME entities).

This is a far cry from banning the BOM outright and the distinction between
larger pieces of text and smaller protocol elements seems like a useful one
(but perhaps not worded optimally yet).

Some thoughts:

- Perhaps the distinction is less between larger and smaller pieces of text
than between payloads and protocol elements proper.

- I think it would be better for *this* RFC to refrain from telling senders
and receivers what to do with the BOM, but to offer advice to protocol
designers.  It is specific protocols that should know better where the BOM
should be banned or allowed.

- There seems to be some confusion over what stripping the BOM means in
practice. 'Stripping' should be more like 'ignoring at appropriate times'.
Example: my web browser gets a BOM-bearing UTF-8 page through HTTP.  Whether
or not it uses the BOM to determine that the page is in UTF-8, the browser
should ignore it when displaying the page to me, but it should certainly not
strip it out when I ask it to save the page to my disk (which is exactly the
point where the BOM becomes useful, as my file system will not preserve as
metadata the fact that this page is in UTF-8).

-- 
Francois

Received on Thursday, 3 October 2002 14:12:55 UTC