W3C home > Mailing lists > Public > www-validator-cvs@w3.org > June 2013

[Bug 22223] Latin-1 characters (æ, þ etc.) are rejected as errors by validator

From: <bugzilla@jessica.w3.org>
Date: Thu, 20 Jun 2013 04:09:58 +0000
To: www-validator-cvs@w3.org
Message-ID: <bug-22223-169-QojZm0RZjG@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=22223

Michael[tm] Smith <mike@w3.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #3 from Michael[tm] Smith <mike@w3.org> ---
(In reply to comment #2)
> I think what they mean by "Authors must use the utf-8 encoding" is that
> authors must declare UTF-8 as 'charset'. Am I right?

No. It means exactly what it says: The contents must be encoded in utf-8.

> I *want* to follow standards. The problem is when I declare UTF-8, meaning
> use it for encoding,

Declaring it by putting a meta@charset element in a file does not magically set
the actual encoding to utf-8. You have to actually encode the contents in
utf-8.

> the browser shows the place-holders for the codepoints
> and the Validator says, "Error found while checking this document as HTML5!"
> 
> Please plug in the following page to the Validator (at validator.w3.org):
> http://ahangama.com/charset-utf-8.htm

That file is not encoded in utf-8. It's encoded in iso-8859-1, which is
something very different from utf-8. The <meta http-equiv="content-type"
content="text/html; charset=utf-8" /> element you have in there does not change
the encoding of the file; all it does is, it makes a browser try to process it
as utf-8 in spite of the fact that it's actually encoded in iso-8859-1. So the
the browser ends up displaying replacement characters for some of the code
points instead of showing the correct glyphs.

If you manually switch the encoding setting for that page in your browser to
iso-8859-1, the characters will display in your browser as expected. 

> Clearly, Validator is wrong and has to be fixed.

There's nothing wrong with the validator. The problem is that you don't
actually have that file encoded in utf-8. You need to figure out how to
actually encode it in utf-8 in whatever editor you're using, and then try
again.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Thursday, 20 June 2013 04:09:59 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:17:55 UTC