W3C home > Mailing lists > Public > www-international@w3.org > April to June 2001

Re: UTF-8 signature in web and email

From: Mark Davis <markdavis34@home.com>
Date: Wed, 16 May 2001 07:46:27 -0700
Message-ID: <007501c0de16$fce92b00$0c680b41@c1340594a>
To: Keld Jørn Simonsen <keld@dkuug.dk>, <duerst@w3.org>
Cc: <www-international@w3.org>, "Unicode" <unicode@unicode.org>
1. Look at http://www.unicode.net/unicode/uni2book/ch13.pdf, Section 13.6,
Byte Order Mark. There are two purposes for the BOM, not just to mark the
endianness. This has been documented for ages.

2. "A BOM is superfluous and will be ignored." A strong statement, Keld.
Have you gone out and determined that all software in the world properly
ignores an initial BOM? If so, you must be quite busy.

Now in an ideal world, the BOM would not be necessary, it is not needed if
data is properly tagged (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE). I
was personally not in favor of it. However, there was a legimate need for
some systems to have it in the early days of Unicode deployment; and who's
to say that we (including both Unicode and 10646) would not have been as
successful as we have been without it?

Mark

----- Original Message -----
From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: <duerst@w3.org>
Cc: <www-international@w3.org>
Sent: Wednesday, May 16, 2001 00:57
Subject: Re: UTF-8 signature in web and email


> For UTF-8 there is no need to have a BOM, as there is only one
> way of serializing octets in UTF-8. There is no little-endian
> or big-endian. A BOM is superfluous and will be ignored.
>
> Kind regards
> Keld
>
> On Wed, May 16, 2001 at 10:47:37AM +0900, Roozbeh Pournader wrote:
> >
> > On Tue, 15 May 2001, Richard, Francois M wrote:
> >
> >  > UTF-8 is considered as a character encoding form as any other...
> >  > For UTF-16 only, the BOM is recommended.
> >  > See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1
> >
> > So BOM for UTF-8 HTML is neither recommended nor discouraged? Does
anyone
> > agree with me that it should be discouraged somewhere?
> >
> >  > 1- An HTTP "charset" parameter in a "Content-Type" field.
> >  > 2- A META declaration with "http-equiv" set to "Content-Type" and a
value
> >  > set for "charset".
> >  > 3- The charset attribute set on an element that designates an
external
> >  > resource.
> >
> > So a BOM will be ignored anyway?
> >
> > --roozbeh
> >
> >
>
>
Received on Wednesday, 16 May 2001 10:46:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:56 GMT