- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Wed, 02 Oct 2002 18:45:42 -0700
- To: "'Mark Davis'" <mark.davis@us.ibm.com>, "McDonald, Ira" <imcdonald@sharplabs.com>
- Cc: Bert Wijnen <bwijnen@lucent.com>, Francois Yergeau <FYergeau@alis.com>, ietf-charsets@iana.org, 'Patrik Fältström' <paf@cisco.com>
Hi Mark and Keld, Thanks for speaking up. I think we need to carefully distinguish that while Unicode 3.2 and ISO 10646:2000 allow (and seem to encourage) leading BOM in UTF-8, an IETF 'standards track' RFC that describes UTF-8 usage _for_Internet_protocols_ should preferably say: 1) Historically, leading BOM usage in the UTF-8 encoding has been allowed by ISO 10646. 2) All Internet protocols SHOULD NOT specify or encourage leading BOM usage in the UTF-8 encoding. (the above wording obviously can be improved - Martin probably said it better already - if I could only find his note...) Cheers, - Ira McDonald (co-editor of Printer MIB v2) High North Inc -----Original Message----- From: Mark Davis [mailto:mark.davis@us.ibm.com] Sent: Wednesday, October 02, 2002 8:17 PM To: McDonald, Ira Cc: Bert Wijnen; Francois Yergeau; ietf-charsets@iana.org; 'Patrik Fältström' Subject: RE: Comments on draft-yergeau-rfc2279bis-00.txt I agree that it should not be encouraged, but it should be recognized. The BOM is also not necessary in a 16-bit UTF either; one can explicitly used UTF-16BE or UTF-16LE; and of course it complicated things. So ideally BOM would not be used there either. However, BOM in either case is in widespread usage, and is allowed in UTF-8. From my perspective, what *would* be very useful would be two have two distinct tags for UTF-8 data. One that allowed the BOM and one (like UTF-16BE) that specifically did not. (Of course, whenever you say 'does not allow the BOM', that really means that an initial U+FEFF is interpreted as a real character as part of the contents, and not stripped). Mark ___ mark.davis@us.ibm.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 "McDonald, Ira" <imcdonald@sharpl To: 'Patrik Fältström' <paf@cisco.com>, Francois Yergeau <FYergeau@alis.com> abs.com> cc: ietf-charsets@iana.org, Bert Wijnen <bwijnen@lucent.com> Subject: RE: Comments on draft-yergeau-rfc2279bis-00.txt 2002.10.02 14:55 Hi, I can't find Martin Duerst's suggested revisions but... This IETF standard should NOT encourage the use of leading BOM in streams of UTF-8 text. The optional use of leading BOM in UTF-8 (as I know Martin said) destroys the crucial property that US-ASCII is a perfect subset of UTF-8 and that US-ASCII can pass _without harm_ through UTF-8 handling software libraries. Specifically, in the printer industry, the optional presence of leading BOM in UTF-8 attribute string values sent over-the-wire in the Internet Printing Protocol/1.1 (IPP/1.1, RFC 2910) has caused bugs, but has _never_ provided any utility. The use of detection of leading BOM by software that guesses the charset encoding of arbitrary text is pernicious and dangerous. UTF-8 never needs a 'byte-order' signature. The concatenation and substring extraction bugs inherent in allowing/encouraging leading BOM in UTF-8 are serious issues. Cheers, - Ira McDonald (co-editor of Printer MIB v2) High North Inc -----Original Message----- From: Patrik Fältström [mailto:paf@cisco.com] Sent: Wednesday, October 02, 2002 5:35 PM To: Francois Yergeau Cc: ietf-charsets@iana.org; Bert Wijnen Subject: Re: Comments on draft-yergeau-rfc2279bis-00.txt On Thursday, September 19, 2002, at 06:49 AM, Francois Yergeau wrote: > I think I have covered most outstanding comments, with the notable > exception of the BOM issue raised by Martin Dürst. This one is neither > trivial nor uncontroversial, and I have not seen anything ressembling a > consensus, so it remains open (no changes to the draft). [2 weeks have passed again, and I have not seen any comments on this list on this] If anyone agree with Martin changes and text about the BOM issue _IS_ needed, let me know no later from one week from now (i.e. october 9). If I don't see anyone screaming, I declare consensus for this draft, and I'll take over from here. Thanks to all of you for all help! paf
Received on Wednesday, 2 October 2002 21:47:58 UTC