W3C home > Mailing lists > Public > www-validator-cvs@w3.org > June 2013

[Bug 22223] Latin-1 characters (æ, þ etc.) are rejected as errors by validator

From: <bugzilla@jessica.w3.org>
Date: Thu, 20 Jun 2013 03:33:39 +0000
To: www-validator-cvs@w3.org
Message-ID: <bug-22223-169-ji57ziIvSs@http.www.w3.org/Bugs/Public/>

jc ahágama <ahangama@gmail.com> changed:

           What    |Removed                     |Added
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |---

--- Comment #2 from jc ahágama <ahangama@gmail.com> ---
Mr. Smith,
Thank you for the reply. I read the page you gave.

Someone closed the issue without testing the issue.

I think what they mean by "Authors must use the utf-8 encoding" is that authors
must declare UTF-8 as 'charset'. Am I right? (I have been writing web pages
since 90's and I believe that I can understand the technical background here).

I *want* to follow standards. The problem is when I declare UTF-8, meaning use
it for encoding, the browser shows the place-holders for the codepoints and the
Validator says, "Error found while checking this document as HTML5!"

Please plug in the following page to the Validator (at validator.w3.org):

The error is explained thus:
"Sorry, I am unable to validate this document because on line 20 it contained
one or more bytes that I cannot interpret as utf-8 (in other words, the bytes
found are not valid values in the specified Character Encoding). Please check
both the content of the file and the character encoding indication.

The error was: utf8 "\xE6" does not map to Unicode"

U00E6 is the Old English  letter Ash (æ). It is found in the following Unicode

Clearly, Validator is wrong and has to be fixed.

You are receiving this mail because:
You are the QA Contact for the bug.
Received on Thursday, 20 June 2013 03:33:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:17:55 UTC