W3C home > Mailing lists > Public > public-qa-dev@w3.org > August 2005

Using HTML::Encoding/Encode in check

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 18 Aug 2005 21:42:52 +0200
To: public-qa-dev@w3.org
Message-ID: <4ko9g19melk3qqsnnu650au9jp7b0j3ans@hive.bjoern.hoehrmann.de>


  As you might have noticed, I've replaced much of the encoding code
with HTML::Encoding and Encode calls, we no longer depend on modules
such as Text::Iconv and Set::IntSpan. It seems quite stable to me,
I'm not aware of a document where the new code behaves differently
from the old one /and/ incorrect. Please give it some testing, it's
on http://qa-dev.w3.org/wmvs/HEAD/ as usual.

I've dropped the few special cases where the Validator suggests to
use e.g. "macintosh" instead of "x-mac-roman" (see my other mail),
it seems 0.7.0 already does not support show source when encoding
errors have been encountered anymore (with 0.6.x you'd get the line
replaced with sth like ### encoding errors here ###) and I've not
restored that. Reporting offsets in the EARL/XML output is dropped,
I am not sure how it should work exactly and I don't think the bene-
fit outweights the implementation and testing cost here.

Error reporting for encoding errors is currently a but poor, it will
always complain about errors in line 0, I'll fix that soon. There
are some subtle bugs I've mentioned on this list that are fixed now
(though not reported in Bugzilla or elsewhere) simply through using
my HTML::Encoding module instead; one change in behavior I can think
of is that for HTML documents with multiple <meta> encodings it will
now pick the first rather than the last one (like the old code did).

This fixed the various bugs we had for the source code excerpts where
the <strong> pointer was inserted incorrectly due to differences in
characters vs octets, this is fixed now as pretty much all strings
(except for the config files, etc) should be utf8_on so substr() etc
now work on Unicode semantics. Please keep that in mind when using
these functions or symbols like \s and \w in the code.

Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Thursday, 18 August 2005 19:42:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:45 GMT