- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Mon, 23 Oct 2000 13:41:44 +0900
- To: www-validator@w3.org
Terje Bless <link@tss.no> wrote:
> IOW: SP has not yet been updated to recognize the BOM as it was only really
> standardized, lesse, two weeks ago.
Well, not really two weeks ago - the XML 1.0 Second Edition is
supposed to be the same as the XML 1.0 First Edition as corrected
by the XML 1.0 Specification Errata.
cf. http://www.w3.org/XML/xml-19980210-errata
BOM in UTF-8 was first mentioned in E44 (which was superceded by E105),
dated 2000-01-06. So it's been there for about 9 months. But anyway,
yes, SP has not yet been updated to recognize the BOM in UTF-8.
cf. http://www.w3.org/XML/xml-19980210-errata#E44
http://www.w3.org/XML/xml-19980210-errata#E105
> And since this is still version 1.0 of
> XML it's impossible to tell if the document is written for "XML 1.0 First
> Edition" or "XML 1.0 Second Edition" so you have to try sniffing for the
> BOM for all XML 1.0 documents and -- until SP is updated (if it's ever
> updated) -- manually supress the error?
We are planning to enhance support for various character encodings,
by converting them to UTF-8 before validation. Similarly, BOM in
UTF-8 could be removed before validation so that SP won't be barfing
on it.
BTW, back to one of the original questions,
Christian Ottosson <christian.ottosson@kurir.net> wrote:
> Do you
> recommend the use of the BOM, as a UTF-8 signature, or should it be
> omitted?
*Personally* I would recommend NOT to use the BOM in UTF-8 whenever
character encoding information can be provided by other means. And
in XML, detecting that an XML entity is encoded in UTF-8 can be done
without the BOM.
Regards,
--
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium
Received on Monday, 23 October 2000 00:41:42 UTC