RE: xml versus xhtml

Hi Matt,

I don't doubt that there are some good reasons to "tidy" up sloppy XML
independently of HTML.  I may or may not understand your particular
application but, hey, there are lots of ways to skin a cat (poor cat).  I
have seen buggy web services that occasionally emit bad (not WF) XML -
containing otherwise good data.

My suggestion would be to submit patches and config options to the
tidy-develop list over at Source Forge.  If you can't convince the group,
you can always apply the patches locally. 

Also, the W3C license is very flexible.  You could use the current source
base as the starting point of a pure XML tool.  

The best course depends on how the tools evolve over time.  I'd think an XML
tool would want to be schema aware (DTD, XML Schema, Schematron, TREX,
RELAX, RDF, etc.), whereas schemas do not capture all of the nuances of HTML
that Tidy needs to handle.

Finally, I think Paul's question is more about the differences between HTML
and XHTML and not about generic XML at all.  I believe the answer is that,
according to HTML 3.2, headings and paragraphs are both "block level"
elements.  Block level elements cause "paragraph breaks" and, thus, may not
be nested (denoted by the element content model %text in the DTD).

take it easy,
Charlie

-----Original Message-----
From: Matt G [mailto:mattg@vguild.com]
Sent: Sunday, September 02, 2001 5:48 PM
To: Paul; html-tidy@w3.org
Subject: Re: xml versus xhtml


That was exactly the question I asked a few days ago, though phrased
differently.

Tidy is designed to fix HTML, not fix XML. I do think there would be some
value for an option within Tidy, or a different version of the tool
(TidyXML?), that would only fix XML, ignoring HTML compliance. Fixing XML
should be infinitely faster, as you can bypass all the logic of what is good
HTML.

And yes, I too thought that output-xml would accomplish that, but it does
not.

    Matt

----- Original Message -----
From: "Paul" <valen@nic.com>
To: <html-tidy@w3.org>
Sent: Sunday, September 02, 2001 3:35 PM
Subject: xml versus xhtml


Hi All,

I am beginning to believe that when there is a discrepancy between me and
Tidy, that Tidy is usually right.  That being said, I'll ask anyway:

When I tidy -

<h1>start of  heading
  <p>paragraph within</p>
        end of heading
</h1>

tidy returns

<h1>start of heading</h1>
<p>paragraph within</p>
<p>end of heading</p>

This seems consistent with earlier observed tidy behavior, namely that
xhtml1.0 dtd disallows <p> within <h1>. So tidy closes the <h1>, etc.  But I
didn't specify output-xhtml.  I specified output-xml.  Isn't the input to
tidy valid h1 'xml'?  If so, why does tidy seem to force compliance with the
xhtml dtd?

Thanks for your help.

Cordially,

Paul

Received on Monday, 10 September 2001 13:46:33 UTC