- From: Gary L Peskin <garyp@firstech.com>
- Date: Fri, 15 Jun 2001 14:04:31 -0700
- To: "Reitzel, Charlie" <CReitzel@arrakisplanet.com>
- CC: html-tidy@w3.org
Well, you're welcome to look at the JTidy DOM implementation, though I hope to deprecate it when the SAX support gets established. We're on sourceforge as JTidy. Gary "Reitzel, Charlie" wrote: > > Hi Gary, > > Thanks for this background. Clearly, you have given the subject a great > deal of careful thought. The SAX-based approach makes a great deal of > sense. > > DOM compatibility is an issue for the C version of Tidy as well. You may > have noticed the discussions on this list. For much the same reasons, > these discussions seemed to gravitate to the SAX adapter method as well. > > I think the tracking issue will go both ways. I think we should look to > JTidy for direction on the library interface and implementation. Ditto for > the SAX adapter, of course. > > take it easy, > Charlie > > -----Original Message----- > From: Gary L Peskin [mailto:garyp@firstech.com] > Sent: Thursday, June 14, 2001 7:55 PM > To: html-tidy@w3.org > Subject: Re: Html-Tidy BUG ??? > > We have several issues here. First, are the DOM classes that are shipped > with JTidy itself. With the exception of the DOMException class, the rest > of the stuff in the DOM package are all interfaces. There is no problem at > all with deleting these classes and replacing them with the official W3C DOM > classes and, in fact, I'll do that shortly. > > The next problem that we have is developing implementations for each of > those interfaces so that we can implement DOM support. JTidy is a > faithful port of the released version of HTML Tidy (more on this later). > So, the JTidy parse tree mirrors the c Tidy parse tree. This tree is NOT a > DOM tree but is a specialized HTML tree which suits Tidy's purposes. > > In order to maintain compatability with c Tidy and make it easy to retrofit > maintenance and enhancements from the c version to the java version, we have > left the tree alone. The parse tree is a central data structure and > monkeying with it would generate a lot of porting issues. > So, instead, what Andy did was create a peer node, called an Adapter, when > needed. The idea is that when we needed to represent something in > the DOM, we created a DOM node which was basically a thin wrapper on the > corresponding Tidy node but which implemented the DOM methods and took into > account the differences between the Tidy node tree structure and the DOM > node tree structure. The Adapter node contains a reference to the Tidy node > and vice-versa and the DOM nodes are only created as needed so there is no > overhead if you're not using the DOM support. > > This is how things were when Andy was unable to continue with the JTidy > development. Along came DOM level 2 and more and more requests for DOM 1 > features that were not implemented in the initial release. In addition, we > had people using XalanJ1, for example, that needed a separate liaison class > to interface with each DOM model so someone would have needed to create a > TidyLiaison to support Xalan. > > As a result of the increasing feature set and complexity of the DOM, I > suggested that it would be a good idea to just have JTidy implement the > SAX2 XMLReader interface so that it could throw off SAX2 events to a SAX2 > ContentHandler. Then, the user could plug in Xerces or whatever other XML > parser implementation they wanted, provided that it supplied a > ContentHandler, which Xerces does, and build their own DOM tree and have > a real DOM and JTidy wouldn't have to worry about keeping up with all of the > DOM features. As a bonus, you'd get SAX2 support as well. This > way, JDOM could be supported as well, I believe, using their SAXBuilder. > > Down the road it would be nice for Tidy to implement JAXP as well but that's > another story. > > I merrily started coding up the XMLReader support last December but have > been delayed for several reasons. I am now almost in a position to get > back into it in a few more days and I hope to have it ready about two weeks > after that. > > For now, the next best thing is to write out the XHTML output from Tidy and > then read it in using your favorite XML parser. It's not a fantastic > solution but it does work. > > In the meantime, I've followed with great interest the impressive activity > over on SourceForge and on this list as well with respect to the HTML Tidy > project. Of course, we'd like to port over the improvements and changes to > HTML Tidy at some point. I haven't seen any mention of a release schedule. > Have I just missed this discussion. I'd rather wait until the HTML Tidy > folks get to a point where you're comfortable with the stability and feature > set and ready for a release rather than trying to port the changes over as > they occur and try to hit a moving target. > > Sorry this post was so long but I didn't have time to make it shorter :) > > Gary > > "Reitzel, Charlie" wrote: > > > > Out of dumb curiousity, can anyone familiar w/ JTidy internals tell us > what > > are the major impediments to W3C DOM compatibility? > > > > -----Original Message----- > > From: Valeri.Atamaniouk@nokia.com [mailto:Valeri.Atamaniouk@nokia.com] > > Sent: Thursday, June 14, 2001 10:45 AM > > To: holger.prause@detewe.de; html-tidy@w3.org > > Subject: RE: Html-Tidy BUG ??? > > > > Hello > > > > The answer is fairly simple: tidy's DOM implementation is not compatible > > with W3C recommendation. > > > > BR > > VA > > > > PS I think you should write a translator from tidy's implementation into > > standard one (just copy the tree). > > > > > -----Original Message----- > > > From: ext Holger Prause [mailto:holger.prause@detewe.de] > > > Sent: 12 June 2001 18:05 > > > To: html-tidy@w3.org > > > Subject: Html-Tidy BUG ??? > > > > > > > > > Hi > > > > > > > > > i am using Jtidy(html tidy) to get a DOM out of some html > > > files and then > > > i get all Links (all Elements with nodename "a").Now i want > > > to take this > > > dom and want it to > > > process with XSLT > > > > > > when i use the following Code i get the following Exception > > > > > > <pre> > > > XSLTProcessor processor = XSLTProcessorFactory.getProcessor(); > > > processor.process(new XSLTInputSource(doc),new > > > XSLTInputSource(new FileInputStream(xslPath)), > > > new XSLTResultTarget(new FileOutputStream(outputFile))); > > > </pre> > > > > > > > > > XSL Error: Cannot use a DTMLiaison for a input DOM node... pass a > > > org.apache.xalan.xpath.xdom.XercesLiaison instead! > > > > > > XSL Error: SAX Exception > > > > > > org.apache.xalan.xslt.XSLProcessorException: > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1799) > > > > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1691) > > > > > > > > > atorg.apache.xalan.xslt.XSLTEngineImpl.getSourceTreeFromInput( > > > XSLTEngineImpl.java:919) > > > > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:643) > > > at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39) > > > at HtmlLinkValidator.validate(HtmlLinkValidator.java:56) > > > at Main.<init>(Main.java:44) > > > at Main.main(Main.java:55) > > > > > > > > > Ok i thought , if he want it that way i pass a xerces liasion > > > > > > <pre> > > > XercesLiaison xl = new XercesLiaison(); > > > XSLTProcessor processor = > > > XSLTProcessorFactory.getProcessor(xl); > > > > > > processor.process(new XSLTInputSource(doc),new > > > XSLTInputSource(new FileInputStream(xslPath)), > > > new XSLTResultTarget(new FileOutputStream(outputFile))); > > > </pre> > > > > > > than i get the following exception > > > XSL Error: SAX Exception > > > > > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not > > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1753) > > > > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1717) > > > > > > at > > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:746) > > > at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39) > > > at HtmlLinkValidator.validate(HtmlLinkValidator.java:56) > > > at Main.<init>(Main.java:44) > > > at Main.main(Main.java:55) > > > > > > > > > " > > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not > > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl " > > > > > > Why is JTidy using its own > > > DOMDocumentImpl(org.w3c.tidy.DOMDocumentImp) > > > and not the DOMDocumentImpl from w3c(org.w3c.dom.DOMDocumentImp) ?? ( > > > > > > This would have saved my a lot of time > > > > > > > > > > > > Now what can i do ? > > > > > > Solution 1: write the tidy-dom to disk and the reparse it with any > > > xml-parser , and the process it > > > > > > Solution 2. > > > > > > write a wrapper wich changes the tidy-dom to an pure > > > org.w3c.dom.Document > > > and then process it > > > > > > Solution 3 : > > > Search for another tool doing it > > > > > > > > > Hmm can anyone of u , especially the developers of this too / > > > libraryl, > > > tell me what to do? > > > > > >
Received on Friday, 15 June 2001 17:04:43 UTC