W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

From: John Cowan <cowan@mercury.ccil.org>
Date: Wed, 8 Jun 2011 11:54:30 -0400
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
Message-ID: <20110608155430.GD14459@mercury.ccil.org>
Leif Halvard Silli scripsit:

> So, you mean: it is a step before the doc is fed to the parser?

Yes.

> I don't see how this is different from what Safari and IE do when they 
> override the HTTP header whenever they see the UTF-8 BOM.

My algorithm is a "file" algorithm; it doesn't know anything about HTTP
headers.

> (Firefox 
> behave as your algorithm, for the file:// protocol.)

Good.  :-)

> >> XML describes normative "fatal error" situations related to encoding:
> >> 
> >> 1. When external encoding info is absent: a) A processor fed with an
> >> entity whose encoding differs from the info in the XML declaration.
> > 
> > This is not actually testable: bad encoding will at best produce an
> > error related to 4 below.
> 
> There is no necessarily need for test (that is: step 4 is not always 
> needed). It can be a matter comparing the encoding labels. Because: 
> First the parser determines the encoding. And if it uses the BOM to 
> determine the encoding, and thereafter discovers that the XML encoding 
> declaration says "KOI8-R" , then we have the "fatal error" situation. 

True.

> But does any XML parser obey this rule? At least Webkit, Opera, Firefox 
> do not. They instead accept the BOM and ignore the XML encoding 
> declaratation. (Exception: if the encoding in the declaration is an 
> unknown encoding, then Webkit shows fatal error - but this is actually 
> 3 - see below.)

Those are XML parsers inside browsers, which I know little about.

> I take your silence as agreement. The "some reason" could be a HTTP 
> header.

Quite so.

-- 
Where the wombat has walked,            John Cowan <cowan@ccil.org>
it will inevitably walk again.          http://www.ccil.org/~cowan
Received on Wednesday, 8 June 2011 15:54:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 8 June 2011 15:54:56 GMT