- From: Francois Yergeau <yergeau@alis.com>
- Date: Tue, 02 Feb 1999 15:34:14 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: "Martin J. Duerst" <duerst@w3.org>, Paul Hoffman / IMC <phoffman@imc.org>, MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@iana.org
À 12:10 02/02/99 -0800, Larry Masinter a écrit : >I think this is the only position consistent with having >three different charset registrations: "BOM should not >be sent with UTF-16BE or UTF-16LE, only with UTF-16." Labelling UTF-16BE (or LE) and then sending a BOM is not inconsistent, it's only redundant. And this redundance can be useful. The explicit label lets the recipient of a MIME object know the endianness without looking inside, which is good. But if the object is then moved elsewhere by a non-MIME protocol (FTP, disk copy, etc.), there is a BOM that the recipient can look at. Since the problem with BOMs is their ambiguousness -- is it a real BOM or an intended ZWNBSP? -- I currently lean toward a "SHOULD NOT put a BOM" unless it's mandatory (such as in XML), in which case it is also unambiguous. Martin Dürst: >> We wouldn't have to change XML, only to add a clarification to >> say that "UTF-16" in the XML spec means only the case >> charset="UTF-16", and not the others. That doesn't work. The producer of an XML entity is not necessarily the MIME processor that will tag it, and may not know whether the entity will be tagged UTF-16 or UTF16(BE|LE). Does it put a BOM? And further, I happen to think that all XML entities (in UTF-16) having a BOM is a Good Thing. The XML spec is designed such that one can always determine the character encoding without external info, let's keep it that way. -- François
Received on Tuesday, 2 February 1999 15:39:08 UTC