- From: Charles Reitzel <creitzel@rcn.com>
- Date: Sun, 02 Feb 2003 17:52:04 -0500
- To: Piotr Banski <bansp@venus.ci.uw.edu.pl>
- Cc: html-tidy@w3.org
Hi Piotr, First, I would recommend that you replace the entire XML declaration with sed. Yes, we might fix the bug (not respecting --xml-decl no w/ RAW encoding), but probably not in time to meet your needs. The format is highly regular and shouldn't be a problem w/ sed or awk in a shell script. You are fighting an uphill battle using RAW in the first place. The next major item on Tidy's agenda is to support pluggable character encodings a la Expat or LibXml. That said, have a look at the recent changes to support ISO-8859-15. It might be easier all around do a patch of your own to support 8859-2 along the same lines. Just thinking out loud here. Second, about the segfault, I found and fixed one in the new diagnostics code. If you are using a Compile Farm executable, it should be there tomorrow. If you are using Windows, I thought I it up after I fixed that problem, but let me know, and I'll make sure to put up a fresh build. If the problem remains, please send a sample config and input file. Thanks. take it easy, Charlie At 11:02 PM 2/2/2003 +0100, Piotr Banski wrote: >I'm trying to prevent Tidy from outputting the xml declaration, because I >want it to read <?xml version="1.0" encoding="iso-8859-2"?>, and as far as >I can see, Tidy won't let me specify this encoding, so I supply the whole >line from a shell script. And, of course, setting add-xml-decl to "no" >does the job *if* I don't also specify char-encoding as "raw". (I specify >it as "raw", to prevent Tidy from mangling Latin-2 characters in the files >I process.) > >So, if I use cmdline arguments, I can suppress the declaration when I do e.g.: > >tidy --output-xml yes --add-xml-decl no --tidy-mark no $1 >> $1.xml > >but it stops working if I do: > >tidy --output-xml yes --add-xml-decl no --char-encoding raw $1 >> $1.xml > >To make things even more interesting, let me add that if I specify >char-encoding as "ascii", it works as it should... > >I get the same behaviour for the versions of 1 Jan and 1 Feb. >Additionally, the Jan version won't read my config file, apparently, and >the Feb version segfaults on the files I need to process (bug report >already posted), so I'm somewhat stuck and will gratefully accept some >advice :-) I mean, if I have to, I will transcode my files before feeding >them to Tidy, but maybe there's something about config options that I've >missed, or some upcoming fix only days (hours? ;-) ) away? > >Thanks, > > Piotr
Received on Sunday, 2 February 2003 17:44:03 UTC