- From: Terje Bless <link@pobox.com>
- Date: Sat, 26 Oct 2002 19:42:58 +0200
- To: W3C Validator <www-validator@w3.org>
- cc: "Christopher R. Maden" <crism@maden.org>
Christopher R. Maden <crism@maden.org> wrote: >When unable to detect an encoding, the new validator should use the >prescribed defaults, which I believe still means ISO8859-1 for text/html >over HTTP, and UTF-8 or UTF-16 for XHTML documents uploaded directly. The HTTP specification does indeed specify ISO-8859-1 as the default value in the absense of a "charset" parameter in the Content-Type header. However HTTP and HTML 4.01 are in direct conflict here as the latter proscribes any assumption about a default character encoding. And since a file upload is still a HTTP transaction, although we do not normally think of it that way, the same applies for any file upload with a text/html media type. >With the simple interface, validating <URL: http://crism.maden.org/ > >reports that it is unable to detect the encoding, including using >Appendix F of XML 1.0. Using Appendix F is inappropriate for a document >delivered over HTTP, since the HTTP headers take precedence (and thus it >should be interpreted as ISO8859-1), but even so, using the Appendix F >algorithm should result in a determination of UTF-8. Either way, since >this page is 7-bit ASCII, the validation ought to work. The algorithm in Appendix F of the XML Recommendation describes ways to attempt to automatically detect the character encoding in use in the absence of information from a higher level protocol. Since the HTTP transaction contained no encoding information, we attempted the Appendix F algorithm. That algorithm however, is intended for XML; and as such it requires either the presence of a UNICODE Byte Order Mark, or an XML Declaration. In particular, if there is no BOM, we look for the bit patterns that represent the characters "<?xml" in various encodings. >The new service looks great, though. Thanks. :-) -- Interviewer: "In what language do you write your algorithms?" Abigail: English. Interviewer: "What would you do if, say, Telnet didn't work?" Abigail: Look at the error message.
Received on Saturday, 26 October 2002 13:43:12 UTC