- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 25 Jul 2012 17:14:42 +0300
- To: "www-international@w3.org" <www-international@w3.org>
May be this is of interest to www-international@: I have today published a report on how HTML parsers and XML parsers determine the character encoding. http://målform.no/blog/white-spots-in-html5-s-encoding-sniffing-algorithm The report concentrates on which encoding signals carries the most weight for browsers: User override vs BOM vs HTTP vs <meta> vs XML encoding declaration vs character detection vs language default vs locale default vs parent browsing context default. And perhaps some things I forgot. The data could be relevant in determining a few issues ahead. Based on those data, I also filed 4 bugs against HTML5: #1 Encoding Sniffing Algorithm: parent browsing context defines encoding default https://www.w3.org/Bugs/Public/show_bug.cgi?id=18394 PROPOSAL: Add a new, 2nd last step, like so: #. If the document lives in a 'nested browsing context', then return the encoding of the 'parent browsing context', as a parent browsing context dictated default encoding, and abort these steps. [nested browsing context = iframe etc] #2 Encoding Sniffing Algorithm: Overrides apply to nested browsing contexts https://www.w3.org/Bugs/Public/show_bug.cgi?id=18395 PROPOSAL: Add a new step after the current first step (about user overriding), like so: #. If the current document lives in the 'nested browsing context'[2] of a document in a 'parent browsing context' whose encoding has been overridden at the request of the user, then return the encoding of the parent browsing context, and abort these steps. #3 Encoding Sniffing Algorithm: Add an XML check as a step zero https://www.w3.org/Bugs/Public/show_bug.cgi?id=18396 PROPOSAL: Add this step as a step zero: #. If the document is an XML document, abort these steps." [Purpose: to avoid that the/an HTML encoding sniffing algorithm (sometimes) is applied to XML.] #4 Encoding Sniffing Algorithm: Clarify what "information on the likely encoding" covers https://www.w3.org/Bugs/Public/show_bug.cgi?id=18397 * E.g. is determining the encoding by, in an HTML document, reading the XML encoding declaration, covered by this by this step? -- Leif Halvard Silli
Received on Wednesday, 25 July 2012 14:15:34 UTC