[Bug 13771] New: Encodings 'misinterpreted for compatibility' should risk fatal error in XHTML

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13771

           Summary: Encodings 'misinterpreted for compatibility' should
                    risk fatal error in XHTML
           Product: HTML WG
           Version: unspecified
          Platform: All
               URL: http://www.w3.org/TR/html5/parsing#table-encoding-over
                    rides
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P3
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


REQUEST:

State that HTML5's table over encoding overrides is not to be adhered to by
XHTML parsers, and that - unless they do deviate from the encoding overrides
table, they effectively do not support those encodings and thus are required to
emit a fatal error whenever they stumble upon such labels.

BACKGROUND:

HTML5 keeps a table over encoding labels that should be 'misinterpreted for
compatibility' - see
<http://www.w3.org/TR/html5/parsing#table-encoding-overrides>. For instance
'US-ASCII' as well as 'ISO-8859-1' should be treated as 'windows-1252'.

By contrast, XML 1.0  operates with the following rule: 

   ]] It is a fatal error when an XML processor encounters 
      an entity with an encoding that it is unable to process [[ 
      <http://www.w3.org/TR/xml/#charencoding>

Thus: An XML parser is required to know, before it parses the page, which
encodings it supports. Hence, if an XHTML parser in an Web browser knows that
it does not support US-ASCII or ISO-8859-1 (because it always instead
misintpretes each of them as WINDOWS-1252), then that parser does not support
whether US-ASCII or ISO-8859-1.

WEB BROWSERS REALITY:

Fact is, that Firefox, Webkit and Opera (don't know about IE9) currenlty fail
to emit fatal error for when an XHTML page is labelled as US-ASCII or
ISO-8859-1 but contains directly typed WINDOWS-1252 legal characters. Thus,
their XML parsers do currently not not support the US-ASCII or ISO-8859-1
encodings. (And according to XML 1.0 they are also not required to support
them!) This violation of XML 1.0 must thus lead to fatal error. 

TEST PAGES:

Test page, US-ASCII labelled (originally UTF-8 encoded) page with illegal
characters:
     http://malform.no/testing/html5/bom/normal-XML-ascii-encoding
Test page, ISO-8859-1 labelled with Windows-1252 characters: 
     http://malform.no/testing/html5/bom/normal-XML-iso88591


JUSTIFICATION:

Since Web browsers fail to adhere to this XML 1.0 rule, and because HTML5
claims to cover both XHTML and HMTL, HTML5 should specify that the encoding
override rules in fact only counts for HTML parses. 


BENEFITS:

If the XHTML parsers inside Web browsers start to emit the required fatal
errors, then it will further strengthen the trend towards UTF-8.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Saturday, 13 August 2011 17:59:05 UTC