Re: Autodetection failure from Terje Bless on 2002-12-09 (www-validator@w3.org from December 2002)

From: Terje Bless <link@pobox.com>
Date: Mon, 9 Dec 2002 07:54:07 +0100
To: W3C Validator <www-validator@w3.org>
cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, Martin Dürst <duerst@w3.org>
Message-ID: <a01060007-1022-032B635C0B4311D7B76600039300CF5C@[193.157.66.10]>

Elliotte Rusty Harold <elharo@metalab.unc.edu> wrote:

>When attempting to validate a document which I identified as XHTML 1.1
>using the pop-up menu, I received the following message:
>
>I was not able to extract a character encoding labeling from any of 
>the valid sources for such information. Without encoding information it
>is impossible to validate the document. [...]
>
>I believe that in this case for XHTML, the fallback should be UTF-8. It
>certainly is for XML, and I don't think there's any reason XHTML should
>be different. If everything else fails, assume UTF-8.

Hmmm. I must admit I'm somewht fuzzy on the details here, but IIRC that for
XML to be transported without explicit encoding information it must contain
an XML Declaration. At least, the autodetect algorithm in Appendix F of the
XML 1.0 Recommendation relies on there being either an XML Declaration or a
UNICODE Byte-Order Mark in the absense of encoding information from a
higher-level protocol (i.e. HTTP).

Which Content-Type the document was served/uploaded as will also affect the
character encoding determination as the different types have different
defaults and defaulting behaviour in this regard.

What was the document you tried to validate? Was it served from a web
server or uploaded using the file upload function?

-- 
Ladies and gentlemen, you must resist those all-too-human feelings and decide
this case on the evidence.    And the evidence plainly shows that Mr. Landa's
injuries,   disfiguring as they are,  are nowhere near as important to a free
society as the fundamental right to make smart-ass remarks.   -- Katie @ AtAT

Received on Monday, 9 December 2002 01:54:30 UTC