- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Wed, 02 Oct 2002 18:45:42 -0700
- To: "'Mark Davis'" <mark.davis@us.ibm.com>, "McDonald, Ira" <imcdonald@sharplabs.com>
- Cc: Bert Wijnen <bwijnen@lucent.com>, Francois Yergeau <FYergeau@alis.com>, ietf-charsets@iana.org, 'Patrik Fältström' <paf@cisco.com>
Hi Mark and Keld,
Thanks for speaking up.
I think we need to carefully distinguish that while Unicode 3.2
and ISO 10646:2000 allow (and seem to encourage) leading BOM
in UTF-8, an IETF 'standards track' RFC that describes UTF-8
usage _for_Internet_protocols_ should preferably say:
1) Historically, leading BOM usage in the UTF-8 encoding
has been allowed by ISO 10646.
2) All Internet protocols SHOULD NOT specify or encourage
leading BOM usage in the UTF-8 encoding.
(the above wording obviously can be improved - Martin probably
said it better already - if I could only find his note...)
Cheers,
- Ira McDonald (co-editor of Printer MIB v2)
High North Inc
-----Original Message-----
From: Mark Davis [mailto:mark.davis@us.ibm.com]
Sent: Wednesday, October 02, 2002 8:17 PM
To: McDonald, Ira
Cc: Bert Wijnen; Francois Yergeau; ietf-charsets@iana.org; 'Patrik
Fältström'
Subject: RE: Comments on draft-yergeau-rfc2279bis-00.txt
I agree that it should not be encouraged, but it should be recognized.
The BOM is also not necessary in a 16-bit UTF either; one can explicitly
used UTF-16BE or UTF-16LE; and of course it complicated things. So ideally
BOM would not be used there either. However, BOM in either case is in
widespread usage, and is allowed in UTF-8.
From my perspective, what *would* be very useful would be two have two
distinct tags for UTF-8 data. One that allowed the BOM and one (like
UTF-16BE) that specifically did not. (Of course, whenever you say 'does not
allow the BOM', that really means that an initial U+FEFF is interpreted as
a real character as part of the contents, and not stripped).
Mark
___
mark.davis@us.ibm.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
"McDonald, Ira"
<imcdonald@sharpl To: 'Patrik Fältström'
<paf@cisco.com>, Francois Yergeau <FYergeau@alis.com>
abs.com> cc:
ietf-charsets@iana.org, Bert Wijnen <bwijnen@lucent.com>
Subject: RE: Comments on
draft-yergeau-rfc2279bis-00.txt
2002.10.02 14:55
Hi,
I can't find Martin Duerst's suggested revisions but...
This IETF standard should NOT encourage the use of leading BOM in
streams of UTF-8 text. The optional use of leading BOM in UTF-8 (as
I know Martin said) destroys the crucial property that US-ASCII
is a perfect subset of UTF-8 and that US-ASCII can pass _without
harm_ through UTF-8 handling software libraries.
Specifically, in the printer industry, the optional presence of
leading BOM in UTF-8 attribute string values sent over-the-wire
in the Internet Printing Protocol/1.1 (IPP/1.1, RFC 2910)
has caused bugs, but has _never_ provided any utility.
The use of detection of leading BOM by software that guesses the
charset encoding of arbitrary text is pernicious and dangerous.
UTF-8 never needs a 'byte-order' signature. The concatenation and
substring extraction bugs inherent in allowing/encouraging leading
BOM in UTF-8 are serious issues.
Cheers,
- Ira McDonald (co-editor of Printer MIB v2)
High North Inc
-----Original Message-----
From: Patrik Fältström [mailto:paf@cisco.com]
Sent: Wednesday, October 02, 2002 5:35 PM
To: Francois Yergeau
Cc: ietf-charsets@iana.org; Bert Wijnen
Subject: Re: Comments on draft-yergeau-rfc2279bis-00.txt
On Thursday, September 19, 2002, at 06:49 AM, Francois Yergeau wrote:
> I think I have covered most outstanding comments, with the notable
> exception of the BOM issue raised by Martin Dürst. This one is neither
> trivial nor uncontroversial, and I have not seen anything ressembling a
> consensus, so it remains open (no changes to the draft).
[2 weeks have passed again, and I have not seen any comments on this
list on this]
If anyone agree with Martin changes and text about the BOM issue _IS_
needed, let me know no later from one week from now (i.e. october 9).
If I don't see anyone screaming, I declare consensus for this draft,
and I'll take over from here.
Thanks to all of you for all help!
paf
Received on Wednesday, 2 October 2002 21:47:58 UTC