W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Re: persistent xml-decl vs. char-encoding

From: Piotr Banski <bansp@venus.ci.uw.edu.pl>
Date: Mon, 3 Feb 2003 00:07:35 +0100 (CET)
To: Charles Reitzel <creitzel@rcn.com>
cc: html-tidy@w3.org
Message-ID: <Pine.LNX.4.21.0302022349320.31608-100000@venus.ci.uw.edu.pl>

Thanks, Charles --

On Sun, 2 Feb 2003, Charles Reitzel wrote:
> First, I would recommend that you replace the entire XML declaration with 
> sed.  Yes, we might fix the bug (not respecting --xml-decl no w/ RAW 
> encoding), but probably not in time to meet your needs.  The format is 
> highly regular and shouldn't be a problem w/ sed or awk in a shell script.
> You are fighting an uphill battle using RAW in the first place.

Given this, I will perhaps begin by transcoding all the files into
Unicode, as I guess this should eliminate both problems (the need for the
"encoding" attribute in the XML declaration and the use of raw encoding).
And if for some reason that won't work satisfactorily, I'll go for sed, as
you say.

> The next major item on Tidy's agenda is to support pluggable character
> encodings a la Expat or LibXml.  That said, have a look at the recent
> changes to support ISO-8859-15.  It might be easier all around do a
> patch of your own to support 8859-2 along the same lines.  Just
> thinking out loud here.

I'm afraid I'm not up to this yet :-)

> Second, about the segfault, I found and fixed one in the new diagnostics 
> code.  If you are using a Compile Farm executable, it should be there 
> tomorrow.  If you are using Windows, I thought I it up after I fixed that 
> problem, but let me know, and I'll make sure to put up a fresh build.  If 
> the problem remains, please send a sample config and input file.  Thanks.

I've already uploaded a test case with my bug report -- will see tomorrow
how the new version fares against it (I'm using Linux).

Thanks again :-)

Received on Sunday, 2 February 2003 18:08:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:53 UTC