W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

persistent xml-decl vs. char-encoding

From: Piotr Banski <bansp@venus.ci.uw.edu.pl>
Date: Sun, 2 Feb 2003 23:02:39 +0100 (CET)
To: html-tidy@w3.org
Message-ID: <Pine.LNX.4.21.0302022256160.29801-100000@venus.ci.uw.edu.pl>

I'm trying to prevent Tidy from outputting the xml declaration, because I
want it to read <?xml version="1.0" encoding="iso-8859-2"?>, and as far as
I can see, Tidy won't let me specify this encoding, so I supply the whole
line from a shell script. And, of course, setting add-xml-decl to "no"
does the job *if* I don't also specify char-encoding as "raw". (I specify
it as "raw", to prevent Tidy from mangling Latin-2 characters in the files
I process.)

So, if I use cmdline arguments, I can suppress the declaration when I do
e.g.:

tidy --output-xml yes --add-xml-decl no --tidy-mark no $1 >> $1.xml

but it stops working if I do:

tidy --output-xml yes --add-xml-decl no --char-encoding raw $1 >> $1.xml

To make things even more interesting, let me add that if I specify
char-encoding as "ascii", it works as it should...

I get the same behaviour for the versions of 1 Jan and 1 Feb.  
Additionally, the Jan version won't read my config file, apparently, and
the Feb version segfaults on the files I need to process (bug report
already posted), so I'm somewhat stuck and will gratefully accept some
advice :-) I mean, if I have to, I will transcode my files before feeding
them to Tidy, but maybe there's something about config options that I've
missed, or some upcoming fix only days (hours? ;-) ) away?

Thanks,

   Piotr
Received on Sunday, 2 February 2003 17:05:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:53 GMT