- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Fri, 15 Jun 2001 16:03:49 -0400
- To: "'Gary L Peskin'" <garyp@firstech.com>, html-tidy@w3.org
Hi Gary, Thanks for this background. Clearly, you have given the subject a great deal of careful thought. The SAX-based approach makes a great deal of sense. DOM compatibility is an issue for the C version of Tidy as well. You may have noticed the discussions on this list. For much the same reasons, these discussions seemed to gravitate to the SAX adapter method as well. I think the tracking issue will go both ways. I think we should look to JTidy for direction on the library interface and implementation. Ditto for the SAX adapter, of course. take it easy, Charlie -----Original Message----- From: Gary L Peskin [mailto:garyp@firstech.com] Sent: Thursday, June 14, 2001 7:55 PM To: html-tidy@w3.org Subject: Re: Html-Tidy BUG ??? We have several issues here. First, are the DOM classes that are shipped with JTidy itself. With the exception of the DOMException class, the rest of the stuff in the DOM package are all interfaces. There is no problem at all with deleting these classes and replacing them with the official W3C DOM classes and, in fact, I'll do that shortly. The next problem that we have is developing implementations for each of those interfaces so that we can implement DOM support. JTidy is a faithful port of the released version of HTML Tidy (more on this later). So, the JTidy parse tree mirrors the c Tidy parse tree. This tree is NOT a DOM tree but is a specialized HTML tree which suits Tidy's purposes. In order to maintain compatability with c Tidy and make it easy to retrofit maintenance and enhancements from the c version to the java version, we have left the tree alone. The parse tree is a central data structure and monkeying with it would generate a lot of porting issues. So, instead, what Andy did was create a peer node, called an Adapter, when needed. The idea is that when we needed to represent something in the DOM, we created a DOM node which was basically a thin wrapper on the corresponding Tidy node but which implemented the DOM methods and took into account the differences between the Tidy node tree structure and the DOM node tree structure. The Adapter node contains a reference to the Tidy node and vice-versa and the DOM nodes are only created as needed so there is no overhead if you're not using the DOM support. This is how things were when Andy was unable to continue with the JTidy development. Along came DOM level 2 and more and more requests for DOM 1 features that were not implemented in the initial release. In addition, we had people using XalanJ1, for example, that needed a separate liaison class to interface with each DOM model so someone would have needed to create a TidyLiaison to support Xalan. As a result of the increasing feature set and complexity of the DOM, I suggested that it would be a good idea to just have JTidy implement the SAX2 XMLReader interface so that it could throw off SAX2 events to a SAX2 ContentHandler. Then, the user could plug in Xerces or whatever other XML parser implementation they wanted, provided that it supplied a ContentHandler, which Xerces does, and build their own DOM tree and have a real DOM and JTidy wouldn't have to worry about keeping up with all of the DOM features. As a bonus, you'd get SAX2 support as well. This way, JDOM could be supported as well, I believe, using their SAXBuilder. Down the road it would be nice for Tidy to implement JAXP as well but that's another story. I merrily started coding up the XMLReader support last December but have been delayed for several reasons. I am now almost in a position to get back into it in a few more days and I hope to have it ready about two weeks after that. For now, the next best thing is to write out the XHTML output from Tidy and then read it in using your favorite XML parser. It's not a fantastic solution but it does work. In the meantime, I've followed with great interest the impressive activity over on SourceForge and on this list as well with respect to the HTML Tidy project. Of course, we'd like to port over the improvements and changes to HTML Tidy at some point. I haven't seen any mention of a release schedule. Have I just missed this discussion. I'd rather wait until the HTML Tidy folks get to a point where you're comfortable with the stability and feature set and ready for a release rather than trying to port the changes over as they occur and try to hit a moving target. Sorry this post was so long but I didn't have time to make it shorter :) Gary "Reitzel, Charlie" wrote: > > Out of dumb curiousity, can anyone familiar w/ JTidy internals tell us what > are the major impediments to W3C DOM compatibility? > > -----Original Message----- > From: Valeri.Atamaniouk@nokia.com [mailto:Valeri.Atamaniouk@nokia.com] > Sent: Thursday, June 14, 2001 10:45 AM > To: holger.prause@detewe.de; html-tidy@w3.org > Subject: RE: Html-Tidy BUG ??? > > Hello > > The answer is fairly simple: tidy's DOM implementation is not compatible > with W3C recommendation. > > BR > VA > > PS I think you should write a translator from tidy's implementation into > standard one (just copy the tree). > > > -----Original Message----- > > From: ext Holger Prause [mailto:holger.prause@detewe.de] > > Sent: 12 June 2001 18:05 > > To: html-tidy@w3.org > > Subject: Html-Tidy BUG ??? > > > > > > Hi > > > > > > i am using Jtidy(html tidy) to get a DOM out of some html > > files and then > > i get all Links (all Elements with nodename "a").Now i want > > to take this > > dom and want it to > > process with XSLT > > > > when i use the following Code i get the following Exception > > > > <pre> > > XSLTProcessor processor = XSLTProcessorFactory.getProcessor(); > > processor.process(new XSLTInputSource(doc),new > > XSLTInputSource(new FileInputStream(xslPath)), > > new XSLTResultTarget(new FileOutputStream(outputFile))); > > </pre> > > > > > > XSL Error: Cannot use a DTMLiaison for a input DOM node... pass a > > org.apache.xalan.xpath.xdom.XercesLiaison instead! > > > > XSL Error: SAX Exception > > > > org.apache.xalan.xslt.XSLProcessorException: > > at > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1799) > > > > at > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1691) > > > > > > atorg.apache.xalan.xslt.XSLTEngineImpl.getSourceTreeFromInput( > > XSLTEngineImpl.java:919) > > > > at > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:643) > > at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39) > > at HtmlLinkValidator.validate(HtmlLinkValidator.java:56) > > at Main.<init>(Main.java:44) > > at Main.main(Main.java:55) > > > > > > Ok i thought , if he want it that way i pass a xerces liasion > > > > <pre> > > XercesLiaison xl = new XercesLiaison(); > > XSLTProcessor processor = > > XSLTProcessorFactory.getProcessor(xl); > > > > processor.process(new XSLTInputSource(doc),new > > XSLTInputSource(new FileInputStream(xslPath)), > > new XSLTResultTarget(new FileOutputStream(outputFile))); > > </pre> > > > > than i get the following exception > > XSL Error: SAX Exception > > > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl > > at > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1753) > > > > at > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1717) > > > > at > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:746) > > at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39) > > at HtmlLinkValidator.validate(HtmlLinkValidator.java:56) > > at Main.<init>(Main.java:44) > > at Main.main(Main.java:55) > > > > > > " > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl " > > > > Why is JTidy using its own > > DOMDocumentImpl(org.w3c.tidy.DOMDocumentImp) > > and not the DOMDocumentImpl from w3c(org.w3c.dom.DOMDocumentImp) ?? ( > > > > This would have saved my a lot of time > > > > > > > > Now what can i do ? > > > > Solution 1: write the tidy-dom to disk and the reparse it with any > > xml-parser , and the process it > > > > Solution 2. > > > > write a wrapper wich changes the tidy-dom to an pure > > org.w3c.dom.Document > > and then process it > > > > Solution 3 : > > Search for another tool doing it > > > > > > Hmm can anyone of u , especially the developers of this too / > > libraryl, > > tell me what to do? > > > >
Received on Friday, 15 June 2001 16:03:17 UTC