Re: HTML5 validator - charset.

On Thursday 12 February 2009, Henri Sivonen wrote:
> On Feb 10, 2009, at 23:25, Thorkil Konnerup wrote:
> > I have tried to use the HTML5 validator.
> >
> > In my pages I use the charset tag:
> >
> > and know that I am using specifik danish 8859-1 characters in my
> > pages.
>
> Have you checked your HTTP headers, too? (It's generally a good idea
> to post the relevant URI when posting to this list.)
>
> > The validator returns the error text:
> > "Internal encoding declaration iso-8859-1 disagrees with the actual
> > encoding of the document (utf-8).."
>
> This means that your file either had the UTF-8 BOM or was declared as
> UTF-8 on the HTTP layer.

In some cases, the W3C validator will also cause these problems.  If there's a 
doctype or charset override in effect, what it passes to the HTML5 validator 
is not the original document but a version of it which has been modified 
according to the user specified overrides.

Currently, a side effect of these overrides is that the document is also 
internally transcoded to UTF-8.  Unfortunately the transcoding process is 
currently only a raw charset conversion, it does not modify encodings in XML 
declarations or meta elements in the transcoded document accordingly, which 
will result in internal/actual encoding declaration mismatches as seen by the 
HTML5 validator.

I thought I had reported this already, but seems I've forgotten.  Done now, 
thanks for reminding.  http://www.w3.org/Bugs/Public/show_bug.cgi?id=6567

Received on Thursday, 12 February 2009 21:36:52 UTC