Re: Html-Tidy BUG ???

We have several issues here.  First, are the DOM classes that are
shipped with JTidy itself.  With the exception of the DOMException
class, the rest of the stuff in the DOM package are all interfaces. 
There is no problem at all with deleting these classes and replacing
them with the official W3C DOM classes and, in fact, I'll do that
shortly.

The next problem that we have is developing implementations for each of
those interfaces so that we can implement DOM support.  JTidy is a
faithful port of the released version of HTML Tidy (more on this
later).  So, the JTidy parse tree mirrors the c Tidy parse tree.  This
tree is NOT a DOM tree but is a specialized HTML tree which suits Tidy's
purposes.

In order to maintain compatability with c Tidy and make it easy to
retrofit maintenance and enhancements from the c version to the java
version, we have left the tree alone.  The parse tree is a central data
structure and monkeying with it would generate a lot of porting issues. 
So, instead, what Andy did was create a peer node, called an Adapter,
when needed.  The idea is that when we needed to represent something in
the DOM, we created a DOM node which was basically a thin wrapper on the
corresponding Tidy node but which implemented the DOM methods and took
into account the differences between the Tidy node tree structure and
the DOM node tree structure.  The Adapter node contains a reference to
the Tidy node and vice-versa and the DOM nodes are only created as
needed so there is no overhead if you're not using the DOM support.

This is how things were when Andy was unable to continue with the JTidy
development. Along came DOM level 2 and more and more requests for DOM 1
features that were not implemented in the initial release.  In addition,
we had people using XalanJ1, for example, that needed a separate liaison
class to interface with each DOM model so someone would have needed to
create a TidyLiaison to support Xalan.

As a result of the increasing feature set and complexity of the DOM, I
suggested that it would be a good idea to just have JTidy implement the
SAX2 XMLReader interface so that it could throw off SAX2 events to a
SAX2 ContentHandler.  Then, the user could plug in Xerces or whatever
other XML parser implementation they wanted, provided that it supplied a
ContentHandler, which Xerces does, and build their own DOM tree and have
a real DOM and JTidy wouldn't have to worry about keeping up with all of
the DOM features.  As a bonus, you'd get SAX2 support as well.  This
way, JDOM could be supported as well, I believe, using their SAXBuilder. 
Down the road it would be nice for Tidy to implement JAXP as well but
that's another story.

I merrily started coding up the XMLReader support last December but have
been delayed for several reasons.  I am now almost in a position to get
back into it in a few more days and I hope to have it ready about two
weeks after that.

For now, the next best thing is to write out the XHTML output from Tidy
and then read it in using your favorite XML parser.  It's not a
fantastic solution but it does work.

In the meantime, I've followed with great interest the impressive
activity over on SourceForge and on this list as well with respect to
the HTML Tidy project.  Of course, we'd like to port over the
improvements and changes to HTML Tidy at some point.  I haven't seen any
mention of a release schedule.  Have I just missed this discussion.  I'd
rather wait until the HTML Tidy folks get to a point where you're
comfortable with the stability and feature set and ready for a release
rather than trying to port the changes over as they occur and try to hit
a moving target.

Sorry this post was so long but I didn't have time to make it shorter :)

Gary

"Reitzel, Charlie" wrote:
> 
> Out of dumb curiousity, can anyone familiar w/ JTidy internals tell us what
> are the major impediments to W3C DOM compatibility?
> 
> -----Original Message-----
> From: Valeri.Atamaniouk@nokia.com [mailto:Valeri.Atamaniouk@nokia.com]
> Sent: Thursday, June 14, 2001 10:45 AM
> To: holger.prause@detewe.de; html-tidy@w3.org
> Subject: RE: Html-Tidy BUG ???
> 
> Hello
> 
> The answer is fairly simple: tidy's DOM implementation is not compatible
> with W3C recommendation.
> 
> BR
> VA
> 
> PS I think you should write a translator from tidy's implementation into
> standard one (just copy the tree).
> 
> > -----Original Message-----
> > From: ext Holger Prause [mailto:holger.prause@detewe.de]
> > Sent: 12 June 2001 18:05
> > To: html-tidy@w3.org
> > Subject: Html-Tidy BUG ???
> >
> >
> > Hi
> >
> >
> > i am using Jtidy(html tidy) to get a DOM out of some html
> > files and then
> > i get all Links (all Elements with nodename "a").Now i want
> > to take this
> > dom and want it to
> > process with XSLT
> >
> > when i use the following Code  i get the following Exception
> >
> > <pre>
> > XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
> >         processor.process(new XSLTInputSource(doc),new
> > XSLTInputSource(new FileInputStream(xslPath)),
> >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > </pre>
> >
> >
> > XSL Error: Cannot use a DTMLiaison for a input DOM node... pass a
> > org.apache.xalan.xpath.xdom.XercesLiaison instead!
> >
> > XSL Error: SAX Exception
> >
> > org.apache.xalan.xslt.XSLProcessorException:
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1799)
> >
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1691)
> >
> >
> > atorg.apache.xalan.xslt.XSLTEngineImpl.getSourceTreeFromInput(
> > XSLTEngineImpl.java:919)
> >
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:643)
> >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> >  at Main.<init>(Main.java:44)
> >  at Main.main(Main.java:55)
> >
> >
> > Ok i thought , if he want it that way i pass a xerces liasion
> >
> > <pre>
> > XercesLiaison xl = new XercesLiaison();
> >         XSLTProcessor processor =
> > XSLTProcessorFactory.getProcessor(xl);
> >
> >         processor.process(new XSLTInputSource(doc),new
> > XSLTInputSource(new FileInputStream(xslPath)),
> >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > </pre>
> >
> > than i get the following exception
> > XSL Error: SAX Exception
> >
> > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > handle nodes of type class org.w3c.tidy.DOMDocumentImpl
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1753)
> >
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1717)
> >
> >  at
> > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:746)
> >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> >  at Main.<init>(Main.java:44)
> >  at Main.main(Main.java:55)
> >
> >
> > "
> > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > handle nodes of type class org.w3c.tidy.DOMDocumentImpl             "
> >
> > Why is JTidy using its own
> > DOMDocumentImpl(org.w3c.tidy.DOMDocumentImp)
> > and not the  DOMDocumentImpl from w3c(org.w3c.dom.DOMDocumentImp) ?? (
> >
> > This would have saved my a lot of time
> >
> >
> >
> > Now what can i do ?
> >
> > Solution 1: write the tidy-dom to disk and the reparse it with any
> > xml-parser , and the process it
> >
> > Solution 2.
> >
> > write a wrapper wich changes the tidy-dom to an pure
> > org.w3c.dom.Document
> > and then process it
> >
> > Solution 3 :
> > Search for another tool doing it
> >
> >
> > Hmm can anyone of u , especially the developers of this too /
> > libraryl,
> > tell me what to do?
> >
> >

Received on Thursday, 14 June 2001 19:55:07 UTC