- From: <bugzilla@jessica.w3.org>
- Date: Sat, 13 Aug 2011 17:59:00 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13771 Summary: Encodings 'misinterpreted for compatibility' should risk fatal error in XHTML Product: HTML WG Version: unspecified Platform: All URL: http://www.w3.org/TR/html5/parsing#table-encoding-over rides OS/Version: All Status: NEW Severity: major Priority: P3 Component: HTML5 spec (editor: Ian Hickson) AssignedTo: ian@hixie.ch ReportedBy: xn--mlform-iua@xn--mlform-iua.no QAContact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org REQUEST: State that HTML5's table over encoding overrides is not to be adhered to by XHTML parsers, and that - unless they do deviate from the encoding overrides table, they effectively do not support those encodings and thus are required to emit a fatal error whenever they stumble upon such labels. BACKGROUND: HTML5 keeps a table over encoding labels that should be 'misinterpreted for compatibility' - see <http://www.w3.org/TR/html5/parsing#table-encoding-overrides>. For instance 'US-ASCII' as well as 'ISO-8859-1' should be treated as 'windows-1252'. By contrast, XML 1.0 operates with the following rule: ]] It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process [[ <http://www.w3.org/TR/xml/#charencoding> Thus: An XML parser is required to know, before it parses the page, which encodings it supports. Hence, if an XHTML parser in an Web browser knows that it does not support US-ASCII or ISO-8859-1 (because it always instead misintpretes each of them as WINDOWS-1252), then that parser does not support whether US-ASCII or ISO-8859-1. WEB BROWSERS REALITY: Fact is, that Firefox, Webkit and Opera (don't know about IE9) currenlty fail to emit fatal error for when an XHTML page is labelled as US-ASCII or ISO-8859-1 but contains directly typed WINDOWS-1252 legal characters. Thus, their XML parsers do currently not not support the US-ASCII or ISO-8859-1 encodings. (And according to XML 1.0 they are also not required to support them!) This violation of XML 1.0 must thus lead to fatal error. TEST PAGES: Test page, US-ASCII labelled (originally UTF-8 encoded) page with illegal characters: http://malform.no/testing/html5/bom/normal-XML-ascii-encoding Test page, ISO-8859-1 labelled with Windows-1252 characters: http://malform.no/testing/html5/bom/normal-XML-iso88591 JUSTIFICATION: Since Web browsers fail to adhere to this XML 1.0 rule, and because HTML5 claims to cover both XHTML and HMTL, HTML5 should specify that the encoding override rules in fact only counts for HTML parses. BENEFITS: If the XHTML parsers inside Web browsers start to emit the required fatal errors, then it will further strengthen the trend towards UTF-8. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Saturday, 13 August 2011 17:59:05 UTC