- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Mon, 23 Oct 2000 13:41:44 +0900
- To: www-validator@w3.org
Terje Bless <link@tss.no> wrote: > IOW: SP has not yet been updated to recognize the BOM as it was only really > standardized, lesse, two weeks ago. Well, not really two weeks ago - the XML 1.0 Second Edition is supposed to be the same as the XML 1.0 First Edition as corrected by the XML 1.0 Specification Errata. cf. http://www.w3.org/XML/xml-19980210-errata BOM in UTF-8 was first mentioned in E44 (which was superceded by E105), dated 2000-01-06. So it's been there for about 9 months. But anyway, yes, SP has not yet been updated to recognize the BOM in UTF-8. cf. http://www.w3.org/XML/xml-19980210-errata#E44 http://www.w3.org/XML/xml-19980210-errata#E105 > And since this is still version 1.0 of > XML it's impossible to tell if the document is written for "XML 1.0 First > Edition" or "XML 1.0 Second Edition" so you have to try sniffing for the > BOM for all XML 1.0 documents and -- until SP is updated (if it's ever > updated) -- manually supress the error? We are planning to enhance support for various character encodings, by converting them to UTF-8 before validation. Similarly, BOM in UTF-8 could be removed before validation so that SP won't be barfing on it. BTW, back to one of the original questions, Christian Ottosson <christian.ottosson@kurir.net> wrote: > Do you > recommend the use of the BOM, as a UTF-8 signature, or should it be > omitted? *Personally* I would recommend NOT to use the BOM in UTF-8 whenever character encoding information can be provided by other means. And in XML, detecting that an XML entity is encoded in UTF-8 can be done without the BOM. Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Monday, 23 October 2000 00:41:42 UTC