RE: Possible bug

It's just a hack, but it may be safest to set both XmlOut=yes and xHTML=yes
in response to the -asxml option.  At config parse time, Tidy may not know
yet if it is dealing w/ pure XML or HTML input.  With both options set, the
output will always be XML of one kind or the other.  Also, it should provide
compatible behavior for most (maybe all) existing scripts.

my $0.02 worth,
Charlie


-----Original Message-----
From: Terry Teague [mailto:terry_teague@users.sourceforge.net]
Sent: Thursday, September 20, 2001 4:44 AM
To: html-tidy@w3.org
Subject: Re: Possible bug


At 12:41 PM +0400 9/18/01, TheCroll [mail.ru] wrote:
>I have downloaded tidy.win32.v1.9beta.exe from tidy.sourceforge.net
>and launched it with -h | more. In the processing directives 
>list it is said that -asxml and -asxhtml are both for to 
>convert HTML to well-formed XHTML. Is it correct (I mean 
>for -asxml)?

It is a little confusing - I asked myself similar questions when I was
modifying the source code, and meant to raise the issue, but forgot (thanks
for bringing the issue up).

Basically the "-asxml" option has been there for a long time (at least a few
years), and while it may have originally been used to convert HTML to
well-formed XML, the code was changed a couple of years ago, so that the
"-asxml" option was used to create well-formed XHTML output (the code in
"tidy.c" sets xHTML = yes). The help msg in "localize.c" has always said :

"-asxml          to convert html to wellformed xml"

This issue was sort of raised before by Douglas Cook on 18 Aug 1999 :

>The last thing I wanted to mention was a minor bug in the command
>line parser for Tidy.  The option -asxml is supposed to make tidy
>output XML, but it is actually parsed into the "output XHTML" 
>option instead.  Change line 713 from "xHTML = yes;" to "XmlOut =
>yes;" to fix the problem.  The distinction is minor, but this keeps
>it consistent with the config file's options.

The above suggested fix was never made.

>As far as that goes, it may make sense to add a separate command line
option
>for xHTML, adding
>
>            else if (strcmp(arg, "asxhtml") == 0)
>                xHTML = yes;
>
>right below line 713.  Of course since the distinction between 
>the XML and XHTML outputs is minimal (Tidy outputs the original
>DOCTYPE in XML vs. a generated DOCTYPE in XHTML, and adds the 
>xmlns attribute to the <html> tag in XHTML, and possibly a few 
>other minor differences that I didn't notice), this may be a moot
>point.

The "-asxhtml" option was added later, I imagine as an alternative spelling
(it is more logical). The code in "tidy.c" also sets xHTML = yes.

What was missing was the help msg in "localize.c" for "-asxhtml", which I
added, and then changed the help msg in "localize.c" for "-asxml" to match
what the option actually did. This change was made recently, and that is why
you noticed the possibility of a bug.

At this time, the config file option "output-xml: yes" is inconsistent in
behaviour with respect to the command line option "-asxml".

I personally tend to side with the other people who say it is a bug, but I
would like to hear more authorative opinions (particularly Dave Raggett,
since I believe this is code he touched a long time ago), before changing
the behaviour/code. It is possible it is also a moot point as described by
Douglas Cook above.

Regards, Terry

Received on Thursday, 20 September 2001 14:58:06 UTC