Re: JTidy and Error in Dom Construction from Scott Dossett on 2000-07-12 (html-tidy@w3.org from July to September 2000)

From: Scott Dossett <sdossett@metaphoria.net>
Date: Wed, 12 Jul 2000 11:49:14 -0400
To: "Gary L Peskin" <garyp@firstech.com>
Cc: "HTMLTidy" <html-tidy@w3.org>
Message-ID: <002001bfec18$bbc6a030$046f10ac@duesenberg>

> If you look at the META tag, it doesn't have a close
> tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the
> child of the META tag.

If this is the situation, how is Tidy able to deal with other "singleton"
tags, such as <IMG...>?


> This node model seems to work for Tidy's purposes

Perhaps I'm unclear on the Tidy's purposes, I had gotten the idea from the
documentation
that it was supposed to create a valid DOM structure from an HTML document
and
allow you traverse it.


> As a
> workaround, I just write out the JTidy output as XHTML and then read it
> back in to a real DOM parser.

This was my first thought, however there are a couple of problems for what
I'm working on:

1.)  It takes a good deal of processing time to parse once to get an XHTML
doc, then pass it through an XML parser (I use Sun's ProjectX - jaxp), and
finally to pass it through an XSLT parser/transformation (jclark's XT in my
case).

2.)  I have also run into issues where XML parsers (XT and ProjectX-jaxp)
have not been able to deal with entities like '¢' returned by Tidy'.

Received on Wednesday, 12 July 2000 11:41:55 UTC