Re: Bug handling utf-16 in w3ctestlib from Geoffrey Sneddon on 2015-10-01 (www-archive@w3.org from October 2015)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Thu, 1 Oct 2015 14:57:04 +0100
To: Ms2ger <ms2ger@gmail.com>
Cc: Peter Linss <peter@linss.com>, www-archive <www-archive@w3.org>
Message-ID: <CAHKdfMgKH-Xp9JQ3F510SFx_WBHgU-H3sPcSu-5gXTejGQJ8KQ@mail.gmail.com>

On Tue, Sep 22, 2015 at 12:13 PM, Ms2ger <ms2ger@gmail.com> wrote:

> As HTMLBinaryInputStream.__init__ already calls detectEncoding(), the
> UTF-16 BOM is no longer in the stream when HTMLSource.parse calls
> detectEncoding() manually. This causes detectEncoding() not to find
> anything interesting, and return windows-1252. Attached is a patch to
> remove the manual handling, instead depending on HTMLParser.parse to
> handle the encoding detection itself.
>
> Could you apply the patch to <https://hg.csswg.org/dev/w3ctestlib>? I
> don't believe I have push access myself.

FWIW, detectEncoding should never be called manually; it'll be called if no
encoding is specified. Somebody (likely me) really needs to do something
about the html5lib docs…

/g

Received on Sunday, 4 October 2015 16:44:23 UTC