Re: Html-Tidy BUG ??? from Gary L Peskin on 2001-06-15 (html-tidy@w3.org from April to June 2001)

From: Gary L Peskin <garyp@firstech.com>
Date: Fri, 15 Jun 2001 14:04:31 -0700
To: "Reitzel, Charlie" <CReitzel@arrakisplanet.com>
CC: html-tidy@w3.org
Message-ID: <3B2A785F.33F87732@firstech.com>
Well, you're welcome to look at the JTidy DOM implementation, though I
hope to deprecate it when the SAX support gets established.  We're on
sourceforge as JTidy.

Gary

"Reitzel, Charlie" wrote:
> 
> Hi Gary,
> 
> Thanks for this background.  Clearly, you have given the subject a great
> deal of careful thought.  The SAX-based approach makes a great deal of
> sense.
> 
> DOM compatibility is an issue for the C version of Tidy as well.  You may
> have noticed the discussions on this list.   For much the same reasons,
> these discussions seemed to gravitate to the SAX adapter method as well.
> 
> I think the tracking issue will go both ways.  I think we should look to
> JTidy for direction on the library interface and implementation.  Ditto for
> the SAX adapter, of course.
> 
> take it easy,
> Charlie
> 
> -----Original Message-----
> From: Gary L Peskin [mailto:garyp@firstech.com]
> Sent: Thursday, June 14, 2001 7:55 PM
> To: html-tidy@w3.org
> Subject: Re: Html-Tidy BUG ???
> 
> We have several issues here.  First, are the DOM classes that are shipped
> with JTidy itself.  With the exception of the DOMException class, the rest
> of the stuff in the DOM package are all interfaces.  There is no problem at
> all with deleting these classes and replacing them with the official W3C DOM
> classes and, in fact, I'll do that shortly.
> 
> The next problem that we have is developing implementations for each of
> those interfaces so that we can implement DOM support.  JTidy is a
> faithful port of the released version of HTML Tidy (more on this later).
> So, the JTidy parse tree mirrors the c Tidy parse tree.  This tree is NOT a
> DOM tree but is a specialized HTML tree which suits Tidy's purposes.
> 
> In order to maintain compatability with c Tidy and make it easy to retrofit
> maintenance and enhancements from the c version to the java version, we have
> left the tree alone.  The parse tree is a central data structure and
> monkeying with it would generate a lot of porting issues.
> So, instead, what Andy did was create a peer node, called an Adapter, when
> needed.  The idea is that when we needed to represent something in
> the DOM, we created a DOM node which was basically a thin wrapper on the
> corresponding Tidy node but which implemented the DOM methods and took into
> account the differences between the Tidy node tree structure and the DOM
> node tree structure.  The Adapter node contains a reference to the Tidy node
> and vice-versa and the DOM nodes are only created as needed so there is no
> overhead if you're not using the DOM support.
> 
> This is how things were when Andy was unable to continue with the JTidy
> development. Along came DOM level 2 and more and more requests for DOM 1
> features that were not implemented in the initial release.  In addition, we
> had people using XalanJ1, for example, that needed a separate liaison class
> to interface with each DOM model so someone would have needed to create a
> TidyLiaison to support Xalan.
> 
> As a result of the increasing feature set and complexity of the DOM, I
> suggested that it would be a good idea to just have JTidy implement the
> SAX2 XMLReader interface so that it could throw off SAX2 events to a SAX2
> ContentHandler.  Then, the user could plug in Xerces or whatever other XML
> parser implementation they wanted, provided that it supplied a
> ContentHandler, which Xerces does, and build their own DOM tree and have
> a real DOM and JTidy wouldn't have to worry about keeping up with all of the
> DOM features.  As a bonus, you'd get SAX2 support as well.  This
> way, JDOM could be supported as well, I believe, using their SAXBuilder.
> 
> Down the road it would be nice for Tidy to implement JAXP as well but that's
> another story.
> 
> I merrily started coding up the XMLReader support last December but have
> been delayed for several reasons.  I am now almost in a position to get
> back into it in a few more days and I hope to have it ready about two weeks
> after that.
> 
> For now, the next best thing is to write out the XHTML output from Tidy and
> then read it in using your favorite XML parser.  It's not a fantastic
> solution but it does work.
> 
> In the meantime, I've followed with great interest the impressive activity
> over on SourceForge and on this list as well with respect to the HTML Tidy
> project.  Of course, we'd like to port over the improvements and changes to
> HTML Tidy at some point.  I haven't seen any mention of a release schedule.
> Have I just missed this discussion.  I'd rather wait until the HTML Tidy
> folks get to a point where you're comfortable with the stability and feature
> set and ready for a release rather than trying to port the changes over as
> they occur and try to hit a moving target.
> 
> Sorry this post was so long but I didn't have time to make it shorter :)
> 
> Gary
> 
> "Reitzel, Charlie" wrote:
> >
> > Out of dumb curiousity, can anyone familiar w/ JTidy internals tell us
> what
> > are the major impediments to W3C DOM compatibility?
> >
> > -----Original Message-----
> > From: Valeri.Atamaniouk@nokia.com [mailto:Valeri.Atamaniouk@nokia.com]
> > Sent: Thursday, June 14, 2001 10:45 AM
> > To: holger.prause@detewe.de; html-tidy@w3.org
> > Subject: RE: Html-Tidy BUG ???
> >
> > Hello
> >
> > The answer is fairly simple: tidy's DOM implementation is not compatible
> > with W3C recommendation.
> >
> > BR
> > VA
> >
> > PS I think you should write a translator from tidy's implementation into
> > standard one (just copy the tree).
> >
> > > -----Original Message-----
> > > From: ext Holger Prause [mailto:holger.prause@detewe.de]
> > > Sent: 12 June 2001 18:05
> > > To: html-tidy@w3.org
> > > Subject: Html-Tidy BUG ???
> > >
> > >
> > > Hi
> > >
> > >
> > > i am using Jtidy(html tidy) to get a DOM out of some html
> > > files and then
> > > i get all Links (all Elements with nodename "a").Now i want
> > > to take this
> > > dom and want it to
> > > process with XSLT
> > >
> > > when i use the following Code  i get the following Exception
> > >
> > > <pre>
> > > XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
> > >         processor.process(new XSLTInputSource(doc),new
> > > XSLTInputSource(new FileInputStream(xslPath)),
> > >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > > </pre>
> > >
> > >
> > > XSL Error: Cannot use a DTMLiaison for a input DOM node... pass a
> > > org.apache.xalan.xpath.xdom.XercesLiaison instead!
> > >
> > > XSL Error: SAX Exception
> > >
> > > org.apache.xalan.xslt.XSLProcessorException:
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1799)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1691)
> > >
> > >
> > > atorg.apache.xalan.xslt.XSLTEngineImpl.getSourceTreeFromInput(
> > > XSLTEngineImpl.java:919)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:643)
> > >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> > >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> > >  at Main.<init>(Main.java:44)
> > >  at Main.main(Main.java:55)
> > >
> > >
> > > Ok i thought , if he want it that way i pass a xerces liasion
> > >
> > > <pre>
> > > XercesLiaison xl = new XercesLiaison();
> > >         XSLTProcessor processor =
> > > XSLTProcessorFactory.getProcessor(xl);
> > >
> > >         processor.process(new XSLTInputSource(doc),new
> > > XSLTInputSource(new FileInputStream(xslPath)),
> > >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > > </pre>
> > >
> > > than i get the following exception
> > > XSL Error: SAX Exception
> > >
> > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1753)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1717)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:746)
> > >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> > >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> > >  at Main.<init>(Main.java:44)
> > >  at Main.main(Main.java:55)
> > >
> > >
> > > "
> > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl             "
> > >
> > > Why is JTidy using its own
> > > DOMDocumentImpl(org.w3c.tidy.DOMDocumentImp)
> > > and not the  DOMDocumentImpl from w3c(org.w3c.dom.DOMDocumentImp) ?? (
> > >
> > > This would have saved my a lot of time
> > >
> > >
> > >
> > > Now what can i do ?
> > >
> > > Solution 1: write the tidy-dom to disk and the reparse it with any
> > > xml-parser , and the process it
> > >
> > > Solution 2.
> > >
> > > write a wrapper wich changes the tidy-dom to an pure
> > > org.w3c.dom.Document
> > > and then process it
> > >
> > > Solution 3 :
> > > Search for another tool doing it
> > >
> > >
> > > Hmm can anyone of u , especially the developers of this too /
> > > libraryl,
> > > tell me what to do?
> > >
> > >
Received on Friday, 15 June 2001 17:04:43 UTC