W3C home > Mailing lists > Public > www-validator-cvs@w3.org > May 2013

[Bug 22223] New: Latin-1 characters (æ, þ etc.) are rejected as errors by validator

From: <bugzilla@jessica.w3.org>
Date: Fri, 31 May 2013 05:17:39 +0000
To: www-validator-cvs@w3.org
Message-ID: <bug-22223-169@http.www.w3.org/Bugs/Public/>

            Bug ID: 22223
           Summary: Latin-1 characters (æ, þ etc.) are rejected as errors
                    by validator
    Classification: Unclassified
           Product: Validator (Nu)
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: General
          Assignee: mike+validator@w3.org
          Reporter: ahangama@gmail.com
        QA Contact: www-validator-cvs@w3.org

An HTML page that has any characters from ISO-8859-1 character set used to
validate as correct when written and tested for HTML4.1. Then when such a page
was written for HTYML5 was tested, windows-1252 was advised to be used over
ISO-8859-1. If there was no charset declaration, it was assumed to be of
WINDOWS-1252 and passed.

Until recently, UTF-8 was encouraged to be used as charset declaration, but
WINDOWS-1252 was accepted. And now the rule is enforced by issuing these errors
/ warnings like these:
1. Using windows-1252 instead of the declared encoding iso-8859-1.
2. Legacy encoding windows-1252 used. Documents should use UTF-8.
3. utf8 "\xE6" does not map to Unicode.

What does 3. above mean? This is a catch-22. If you declare UTF-8, it is an
error because æ, þ and are outside Unicode. I thought we are talking about
UTF-8 encoding of characters. How does Unicode factor in here?

RFC-3629 is very clear about how to encode ASCII and Latin-1 (SBCS) characters
into UTF-8. It appears that ASCII is accepted and Latin-1 Extension is rejected
for some unpublished reason.

Please check these pages to understand the problem.

Thank you.

You are receiving this mail because:
You are the QA Contact for the bug.
Received on Friday, 31 May 2013 05:17:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:03:18 UTC