Re: Should the UTF-8 BOM trump overriding via HTTP or by users? from John Cowan on 2011-06-08 (www-international@w3.org from April to June 2011)

From: John Cowan <cowan@mercury.ccil.org>
Date: Wed, 8 Jun 2011 13:36:01 -0400
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
Message-ID: <20110608173601.GF14459@mercury.ccil.org>

Leif Halvard Silli scripsit:

> So, that algorithm effectively plays the role of an external encoding
> information. Becuase, unless, the XML parser is not permitted to
> interpret the document different from the XML encoding declaration.

Not quite.  Rather, the algorithm allows the parser to interpret the
encoding declaration, which is not self-interpreting in the case of
UTF-16*, UTF-32*, EBCDIC-*, and other non-ASCII-compatible encodings.

It remains the responsibility of the parser to check the encoding
returned by the sniffer against the encoding in the declaration, if any.
If they don't match, boom.  So in that sense only, the sniffer plays
the role of an external encoding.  But unlike HTTP headers, it cannot
*override* the encoding declaration.

-- 
The experiences of the past show                John Cowan
that there has always been a discrepancy        cowan@ccil.org
between plans and performance.                  http://www.ccil.org/~cowan
        --Emperor Hirohito, August 1945

Received on Wednesday, 8 June 2011 17:36:37 UTC