- From: Scott Dossett <sdossett@metaphoria.net>
- Date: Wed, 12 Jul 2000 11:49:14 -0400
- To: "Gary L Peskin" <garyp@firstech.com>
- Cc: "HTMLTidy" <html-tidy@w3.org>
> If you look at the META tag, it doesn't have a close > tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the > child of the META tag. If this is the situation, how is Tidy able to deal with other "singleton" tags, such as <IMG...>? > This node model seems to work for Tidy's purposes Perhaps I'm unclear on the Tidy's purposes, I had gotten the idea from the documentation that it was supposed to create a valid DOM structure from an HTML document and allow you traverse it. > As a > workaround, I just write out the JTidy output as XHTML and then read it > back in to a real DOM parser. This was my first thought, however there are a couple of problems for what I'm working on: 1.) It takes a good deal of processing time to parse once to get an XHTML doc, then pass it through an XML parser (I use Sun's ProjectX - jaxp), and finally to pass it through an XSLT parser/transformation (jclark's XT in my case). 2.) I have also run into issues where XML parsers (XT and ProjectX-jaxp) have not been able to deal with entities like '¢' returned by Tidy'.
Received on Wednesday, 12 July 2000 11:41:55 UTC