- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Wed, 18 Oct 2000 02:47:31 +0900
- To: link@tss.no
- Cc: christian.ottosson@kurir.net, plh@w3.org, www-validator@w3.org
Terje Bless <link@tss.no> wrote: > >That's why the validator correctly reports errors (apart from BOM). > > So there isn't any reason that it should be barfing on the BOM? Actually this is a "crack" between the First Edition (REC-xml-19980210) and the Second Edition (REC-xml-20001006) of XML 1.0, IMHO. "F. Autodetection of Character Encodings" of REC-xml-19980210, though non-normative, provided an autodetection algorithm of character encoding. There was no mention of the BOM in UTF-8, so it would not be unreasonable to report the byte sequences of EF BB BF at the beginning of an XML entity as an error. I looked at the source code of SP 1.3.4 as well as 1.3, and it seems the XMLDecoder class is based on the appendix F of REC-xml-19980210. cf. http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing Appendix F of REC-xml-20001006, however, does mention the case when the BOM is used in UTF-8. Appendix F was completely rewritten in REC-xml-20001006, and I think this is the most significant change between REC-xml-19980210 and REC-xml-20001006. cf. http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing So, according to the Second Edition of XML 1.0, the validator should not be barfing on the BOM. Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Tuesday, 17 October 2000 13:48:10 UTC