- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 25 Jul 2012 17:14:42 +0300
- To: "www-international@w3.org" <www-international@w3.org>
May be this is of interest to www-international@: I have today
published a report on how HTML parsers and XML parsers determine the
character encoding.
http://målform.no/blog/white-spots-in-html5-s-encoding-sniffing-algorithm
The report concentrates on which encoding signals carries the most
weight for browsers: User override vs BOM vs HTTP vs <meta> vs XML
encoding declaration vs character detection vs language default vs
locale default vs parent browsing context default. And perhaps some
things I forgot. The data could be relevant in determining a few issues
ahead.
Based on those data, I also filed 4 bugs against HTML5:
#1 Encoding Sniffing Algorithm:
parent browsing context defines encoding default
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18394
PROPOSAL: Add a new, 2nd last step, like so:
#. If the document lives in a 'nested browsing context',
then return the encoding of the 'parent browsing context',
as a parent browsing context dictated default encoding,
and abort these steps.
[nested browsing context = iframe etc]
#2 Encoding Sniffing Algorithm:
Overrides apply to nested browsing contexts
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18395
PROPOSAL: Add a new step after the current first step (about
user overriding), like so:
#. If the current document lives in the 'nested browsing
context'[2] of a document in a 'parent browsing context'
whose encoding has been overridden at the request of the
user, then return the encoding of the parent browsing
context, and abort these steps.
#3 Encoding Sniffing Algorithm:
Add an XML check as a step zero
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18396
PROPOSAL: Add this step as a step zero:
#. If the document is an XML document, abort these steps."
[Purpose: to avoid that the/an HTML encoding sniffing
algorithm (sometimes) is applied to XML.]
#4 Encoding Sniffing Algorithm:
Clarify what "information on the likely encoding" covers
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18397
* E.g. is determining the encoding by, in an HTML document,
reading the XML encoding declaration, covered by this
by this step?
--
Leif Halvard Silli
Received on Wednesday, 25 July 2012 14:15:34 UTC