Re: continue message from Jukka K. Korpela on 2022-01-03 (www-validator@w3.org from January 2022)

From: Jukka K. Korpela <jukkakk@gmail.com>
Date: Mon, 3 Jan 2022 14:13:01 +0200
To: Daniel Lamberti <daniellamberti@gmail.com>
Cc: W3C WWW Validator <www-validator@w3.org>
Message-ID: <CAGHxYa6q=nP+LSR-H_AqDoWS_q1NSEDQPr0AX7cD4qCKd+zhPw@mail.gmail.com>

 Daniel Lamberti (daniellamberti@gmail.com) wrote:

>
>
> ---------- Forwarded message ---------
> From: Daniel Lamberti <daniellamberti@gmail.com>
> Date: Sat, Jan 1, 2022 at 2:27 PM
> Subject: issue with encoding
> To: <www-validator@w3.org>
>

It looks like you sent the message January 1st but somehow it did not get
through. Anyway, in the list archives
https://lists.w3.org/Archives/Public/www-validator/2022Jan/
that message is not present, but this one is.

> No matter what I try I always get same result, for some reason W3C
> validator does not recognize utf-8 encoding, and point out all the time
> that the document is under windows-1252 encoding.
> I save the index.html with and without BOM, I tried to save the
> document on a text editor making sure that is utf-8 encoding, but never get
> success.
> I tried just tu save another .html file, completely empty (just with
> regular head, body with a h1 "hello") and I got same result.
>

The validator gives priority to the encoding declared in HTTP headers, over
<meta> tags. And the response headers for

> www.daniellamberti.com
>

include

Content-Type: text/html; charset=ISO-8859-1

as you can see e.g. using https://websniffer.cc/ or
https://www.rexswain.com/httpview.html . The latter also shows (when using
“Hex” for “Display option”) that there is no BOM at the start.

I have no idea what you might need to do with the server to make it send a
Content-Type header specifying charset=utf-8 (or no charset at all).

By the way, the <html> tag declares lang=en, even though the content is
Spanish. It should be lang=es. The strange thing here is that if I remember
correctly, the validator used to run a heuristic check of the actual text
content against the lang declaration and report suspected mismatches. In
this case, I would very much expect any reasonable heuristics to detect and
report the problem-.

Yucca, https://jkorpela.fi

Received on Monday, 3 January 2022 12:13:26 UTC