- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 03 Oct 2002 14:11:39 -0400
- To: ietf-charsets@iana.org
Martin Duerst wrote: > Therefore, senders SHOULD NOT use the BOM in larger, usually > labeled, pieces of text (e.g. MIME entities), and MUST NOT > use it in smaller protocol elements (usually with a fixed > encoding). Receivers SHOULD recognize and remove the BOM > in larger, usually labeled, pieces of text (e.g. MIME entities). This is a far cry from banning the BOM outright and the distinction between larger pieces of text and smaller protocol elements seems like a useful one (but perhaps not worded optimally yet). Some thoughts: - Perhaps the distinction is less between larger and smaller pieces of text than between payloads and protocol elements proper. - I think it would be better for *this* RFC to refrain from telling senders and receivers what to do with the BOM, but to offer advice to protocol designers. It is specific protocols that should know better where the BOM should be banned or allowed. - There seems to be some confusion over what stripping the BOM means in practice. 'Stripping' should be more like 'ignoring at appropriate times'. Example: my web browser gets a BOM-bearing UTF-8 page through HTTP. Whether or not it uses the BOM to determine that the page is in UTF-8, the browser should ignore it when displaying the page to me, but it should certainly not strip it out when I ask it to save the page to my disk (which is exactly the point where the BOM becomes useful, as my file system will not preserve as metadata the fact that this page is in UTF-8). -- Francois
Received on Thursday, 3 October 2002 14:12:55 UTC