[Bug 22223] Latin-1 characters (æ, þ etc.) are rejected as errors by validator

https://www.w3.org/Bugs/Public/show_bug.cgi?id=22223

jc ahágama <ahangama@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |---

--- Comment #2 from jc ahágama <ahangama@gmail.com> ---
Mr. Smith,
Thank you for the reply. I read the page you gave.

Someone closed the issue without testing the issue.

I think what they mean by "Authors must use the utf-8 encoding" is that authors
must declare UTF-8 as 'charset'. Am I right? (I have been writing web pages
since 90's and I believe that I can understand the technical background here).

I *want* to follow standards. The problem is when I declare UTF-8, meaning use
it for encoding, the browser shows the place-holders for the codepoints and the
Validator says, "Error found while checking this document as HTML5!"

Please plug in the following page to the Validator (at validator.w3.org):
http://ahangama.com/charset-utf-8.htm

The error is explained thus:
"Sorry, I am unable to validate this document because on line 20 it contained
one or more bytes that I cannot interpret as utf-8 (in other words, the bytes
found are not valid values in the specified Character Encoding). Please check
both the content of the file and the character encoding indication.

The error was: utf8 "\xE6" does not map to Unicode"

U00E6 is the Old English  letter Ash (æ). It is found in the following Unicode
block:
http://www.unicode.org/charts/PDF/U0080.pdf

Clearly, Validator is wrong and has to be fixed.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Thursday, 20 June 2013 03:33:41 UTC