W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

Re: JTidy and Error in Dom Construction

From: Scott Dossett <sdossett@metaphoria.net>
Date: Wed, 12 Jul 2000 11:49:14 -0400
Message-ID: <002001bfec18$bbc6a030$046f10ac@duesenberg>
To: "Gary L Peskin" <garyp@firstech.com>
Cc: "HTMLTidy" <html-tidy@w3.org>
> If you look at the META tag, it doesn't have a close
> tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the
> child of the META tag.

If this is the situation, how is Tidy able to deal with other "singleton"
tags, such as <IMG...>?


> This node model seems to work for Tidy's purposes

Perhaps I'm unclear on the Tidy's purposes, I had gotten the idea from the
documentation
that it was supposed to create a valid DOM structure from an HTML document
and
allow you traverse it.


> As a
> workaround, I just write out the JTidy output as XHTML and then read it
> back in to a real DOM parser.

This was my first thought, however there are a couple of problems for what
I'm working on:

1.)  It takes a good deal of processing time to parse once to get an XHTML
doc, then pass it through an XML parser (I use Sun's ProjectX - jaxp), and
finally to pass it through an XSLT parser/transformation (jclark's XT in my
case).

2.)  I have also run into issues where XML parsers (XT and ProjectX-jaxp)
have not been able to deal with entities like '' returned by Tidy'.
Received on Wednesday, 12 July 2000 11:41:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT