[Bug 18396] New: Encoding Sniffing Algorithm: Add an XML check as a step zero

https://www.w3.org/Bugs/Public/show_bug.cgi?id=18396

           Summary: Encoding Sniffing Algorithm: Add an XML check as a
                    step zero
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL: http://dev.w3.org/html5/spec/Overview#encoding-sniffin
                    g-algorithm
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec
        AssignedTo: ian@hixie.ch
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


Proposal: Extend the encoding sniffing algorithm by adding a new,
          explicit step zero, like so:

     0. If the document is an XML document, abort these steps.

Justification.

    By extending the algorithm this way, then there is an *explicit* 
step to 'jump out of the algorithm if XML' - for which it would also be 
possible write test cases.

    Currently, and especially if the XML document lives in a 'nested 
browsing context'[1], then (unless there is a BOM) some browsers let 
the XML doc default to the encoding of the 'parent browsing context' 
instead of letting it default to the default encoding of the XML format 
(UTF-8). Webkit/Chromium/Opera have this error. Firefox do not have 
this error. I did not test IE9/10 yet, but suspect they are more on 
Firefox' side. Regarding defaulting to the encoding of the parent 
browsing context, then [see bug #foo and see bug #bar]

More data in my related blog post.[2]

[1] http://dev.w3.org/html5/spec/Overview#nested-browsing-context
[2] http://målform.no/blog/white-spots-in-html5-s-encoding-sniffing-algorithm

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Wednesday, 25 July 2012 12:31:06 UTC