Re: JTidy DOM Implementation

Russell Gold wrote:
> At 2:29 AM -0500 10/30/00, Gary L Peskin wrote:
> >I would like to propose that we discard the Tidy DOM classes
> >altogether.  The DOM is about to have a Level 2 Recommendation with
> >Level 3 coming not too far after that.  The functionality is rapidly
> >increasing and I don't think that there's any way that we can or should
> >keep up.  DOM now offers the whole traversal specification with
> >NodeFilters, NodeIterators, etc.
> >
> >What I propose is that we add a SAX 2 output capability to JTidy and
> >discard the DOM classes altogether.  The SAX 2 interface would fit in
> >nicely with the current XML/XHTML output options and we would be done
> >with it.  If someone wanted a DOM implementation, they could use a
> >parser like Xerces and drive it with SAX 2 events.  Then, they could
> >have a full-blown DOM implementation supported by a team that is focused
> >on that.
> 
> Ouch!  I depend very heavily on the JTidy DOM classes. If they disappeared, I
> would have absolutely no use for the library. What is the C version doing?  I
> had rather assumed JTidy to be a Java port of it rather than a parallel
> implementation.

I don't think that the c version of tidy implements the DOM at all!  I
suppose we could keep the existing DOM classes and continue to support
them as is and enhance them as time allows.  But for people who need
full DOM support, it would be much easier for us to build in SAX 2
events.

Russell, for your application, you'd just need to add in a few lines of
code to a handler that can accept SAX 2 events (like Xerces) and that
can build a DOM tree from that.  Then, you'd use the Xerces DOM tree and
have a full blown DOM implementation with all of the features that
Xerces supports.

It could be somewhat slower and consume more memory because we'd first
be building the tidy tree and then walking it to generate the SAX
events.  But HTML pages are usually small and I think the increased
benefits would outweigh these disadvantages.

Gary

Received on Monday, 30 October 2000 10:01:35 UTC