Re: JTidy and Error in Dom Construction

> If you look at the META tag, it doesn't have a close
> tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the
> child of the META tag.

If this is the situation, how is Tidy able to deal with other "singleton"
tags, such as <IMG...>?


> This node model seems to work for Tidy's purposes

Perhaps I'm unclear on the Tidy's purposes, I had gotten the idea from the
documentation
that it was supposed to create a valid DOM structure from an HTML document
and
allow you traverse it.


> As a
> workaround, I just write out the JTidy output as XHTML and then read it
> back in to a real DOM parser.

This was my first thought, however there are a couple of problems for what
I'm working on:

1.)  It takes a good deal of processing time to parse once to get an XHTML
doc, then pass it through an XML parser (I use Sun's ProjectX - jaxp), and
finally to pass it through an XSLT parser/transformation (jclark's XT in my
case).

2.)  I have also run into issues where XML parsers (XT and ProjectX-jaxp)
have not been able to deal with entities like '¢' returned by Tidy'.

Received on Wednesday, 12 July 2000 11:41:55 UTC