W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: Html-Tidy BUG ???

From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
Date: Fri, 15 Jun 2001 18:31:53 -0400
Message-ID: <B5C79DDBC655D311B6BD0008C7E64D76013C1621@exchange.arrakisplanet.com>
To: "'Gary L Peskin'" <garyp@firstech.com>
Cc: html-tidy@w3.org
I've looked at it, but not in depth.  Thanks to Terry's hard work, I'm
behind on my Tidy bug list <grin>.  So I'll wait patiently to see you bring
out for SAX.

Charlie

-----Original Message-----
From: Gary L Peskin [mailto:garyp@firstech.com]
Sent: Friday, June 15, 2001 5:05 PM
To: Reitzel, Charlie
Cc: html-tidy@w3.org
Subject: Re: Html-Tidy BUG ???


Well, you're welcome to look at the JTidy DOM implementation, though I
hope to deprecate it when the SAX support gets established.  We're on
sourceforge as JTidy.

Gary

"Reitzel, Charlie" wrote:
> 
> Hi Gary,
> 
> Thanks for this background.  Clearly, you have given the subject a great
> deal of careful thought.  The SAX-based approach makes a great deal of
> sense.
> 
> DOM compatibility is an issue for the C version of Tidy as well.  You may
> have noticed the discussions on this list.   For much the same reasons,
> these discussions seemed to gravitate to the SAX adapter method as well.
> 
> I think the tracking issue will go both ways.  I think we should look to
> JTidy for direction on the library interface and implementation.  Ditto
for
> the SAX adapter, of course.
> 
> take it easy,
> Charlie
> 
> -----Original Message-----
> From: Gary L Peskin [mailto:garyp@firstech.com]
> Sent: Thursday, June 14, 2001 7:55 PM
> To: html-tidy@w3.org
> Subject: Re: Html-Tidy BUG ???
> 
> We have several issues here.  First, are the DOM classes that are shipped
> with JTidy itself.  With the exception of the DOMException class, the rest
> of the stuff in the DOM package are all interfaces.  There is no problem
at
> all with deleting these classes and replacing them with the official W3C
DOM
> classes and, in fact, I'll do that shortly.
> 
> The next problem that we have is developing implementations for each of
> those interfaces so that we can implement DOM support.  JTidy is a
> faithful port of the released version of HTML Tidy (more on this later).
> So, the JTidy parse tree mirrors the c Tidy parse tree.  This tree is NOT
a
> DOM tree but is a specialized HTML tree which suits Tidy's purposes.
> 
> In order to maintain compatability with c Tidy and make it easy to
retrofit
> maintenance and enhancements from the c version to the java version, we
have
> left the tree alone.  The parse tree is a central data structure and
> monkeying with it would generate a lot of porting issues.
> So, instead, what Andy did was create a peer node, called an Adapter, when
> needed.  The idea is that when we needed to represent something in
> the DOM, we created a DOM node which was basically a thin wrapper on the
> corresponding Tidy node but which implemented the DOM methods and took
into
> account the differences between the Tidy node tree structure and the DOM
> node tree structure.  The Adapter node contains a reference to the Tidy
node
> and vice-versa and the DOM nodes are only created as needed so there is no
> overhead if you're not using the DOM support.
> 
> This is how things were when Andy was unable to continue with the JTidy
> development. Along came DOM level 2 and more and more requests for DOM 1
> features that were not implemented in the initial release.  In addition,
we
> had people using XalanJ1, for example, that needed a separate liaison
class
> to interface with each DOM model so someone would have needed to create a
> TidyLiaison to support Xalan.
> 
> As a result of the increasing feature set and complexity of the DOM, I
> suggested that it would be a good idea to just have JTidy implement the
> SAX2 XMLReader interface so that it could throw off SAX2 events to a SAX2
> ContentHandler.  Then, the user could plug in Xerces or whatever other XML
> parser implementation they wanted, provided that it supplied a
> ContentHandler, which Xerces does, and build their own DOM tree and have
> a real DOM and JTidy wouldn't have to worry about keeping up with all of
the
> DOM features.  As a bonus, you'd get SAX2 support as well.  This
> way, JDOM could be supported as well, I believe, using their SAXBuilder.
> 
> Down the road it would be nice for Tidy to implement JAXP as well but
that's
> another story.
> 
> I merrily started coding up the XMLReader support last December but have
> been delayed for several reasons.  I am now almost in a position to get
> back into it in a few more days and I hope to have it ready about two
weeks
> after that.
> 
> For now, the next best thing is to write out the XHTML output from Tidy
and
> then read it in using your favorite XML parser.  It's not a fantastic
> solution but it does work.
> 
> In the meantime, I've followed with great interest the impressive activity
> over on SourceForge and on this list as well with respect to the HTML Tidy
> project.  Of course, we'd like to port over the improvements and changes
to
> HTML Tidy at some point.  I haven't seen any mention of a release
schedule.
> Have I just missed this discussion.  I'd rather wait until the HTML Tidy
> folks get to a point where you're comfortable with the stability and
feature
> set and ready for a release rather than trying to port the changes over as
> they occur and try to hit a moving target.
> 
> Sorry this post was so long but I didn't have time to make it shorter :)
> 
> Gary
> 
> "Reitzel, Charlie" wrote:
> >
> > Out of dumb curiousity, can anyone familiar w/ JTidy internals tell us
> what
> > are the major impediments to W3C DOM compatibility?
> >
> > -----Original Message-----
> > From: Valeri.Atamaniouk@nokia.com [mailto:Valeri.Atamaniouk@nokia.com]
> > Sent: Thursday, June 14, 2001 10:45 AM
> > To: holger.prause@detewe.de; html-tidy@w3.org
> > Subject: RE: Html-Tidy BUG ???
> >
> > Hello
> >
> > The answer is fairly simple: tidy's DOM implementation is not compatible
> > with W3C recommendation.
> >
> > BR
> > VA
> >
> > PS I think you should write a translator from tidy's implementation into
> > standard one (just copy the tree).
> >
> > > -----Original Message-----
> > > From: ext Holger Prause [mailto:holger.prause@detewe.de]
> > > Sent: 12 June 2001 18:05
> > > To: html-tidy@w3.org
> > > Subject: Html-Tidy BUG ???
> > >
> > >
> > > Hi
> > >
> > >
> > > i am using Jtidy(html tidy) to get a DOM out of some html
> > > files and then
> > > i get all Links (all Elements with nodename "a").Now i want
> > > to take this
> > > dom and want it to
> > > process with XSLT
> > >
> > > when i use the following Code  i get the following Exception
> > >
> > > <pre>
> > > XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
> > >         processor.process(new XSLTInputSource(doc),new
> > > XSLTInputSource(new FileInputStream(xslPath)),
> > >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > > </pre>
> > >
> > >
> > > XSL Error: Cannot use a DTMLiaison for a input DOM node... pass a
> > > org.apache.xalan.xpath.xdom.XercesLiaison instead!
> > >
> > > XSL Error: SAX Exception
> > >
> > > org.apache.xalan.xslt.XSLProcessorException:
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1799)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1691)
> > >
> > >
> > > atorg.apache.xalan.xslt.XSLTEngineImpl.getSourceTreeFromInput(
> > > XSLTEngineImpl.java:919)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:643)
> > >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> > >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> > >  at Main.<init>(Main.java:44)
> > >  at Main.main(Main.java:55)
> > >
> > >
> > > Ok i thought , if he want it that way i pass a xerces liasion
> > >
> > > <pre>
> > > XercesLiaison xl = new XercesLiaison();
> > >         XSLTProcessor processor =
> > > XSLTProcessorFactory.getProcessor(xl);
> > >
> > >         processor.process(new XSLTInputSource(doc),new
> > > XSLTInputSource(new FileInputStream(xslPath)),
> > >         new XSLTResultTarget(new FileOutputStream(outputFile)));
> > > </pre>
> > >
> > > than i get the following exception
> > > XSL Error: SAX Exception
> > >
> > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1753)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.error(XSLTEngineImpl.java:1717)
> > >
> > >  at
> > > org.apache.xalan.xslt.XSLTEngineImpl.process(XSLTEngineImpl.java:746)
> > >  at DOMToHtmlSerializer.serialize(DOMToHtmlSerializer.java:39)
> > >  at HtmlLinkValidator.validate(HtmlLinkValidator.java:56)
> > >  at Main.<init>(Main.java:44)
> > >  at Main.main(Main.java:55)
> > >
> > >
> > > "
> > > org.apache.xalan.xslt.XSLProcessorException: XercesLiaison can not
> > > handle nodes of type class org.w3c.tidy.DOMDocumentImpl             "
> > >
> > > Why is JTidy using its own
> > > DOMDocumentImpl(org.w3c.tidy.DOMDocumentImp)
> > > and not the  DOMDocumentImpl from w3c(org.w3c.dom.DOMDocumentImp) ?? (
> > >
> > > This would have saved my a lot of time
> > >
> > >
> > >
> > > Now what can i do ?
> > >
> > > Solution 1: write the tidy-dom to disk and the reparse it with any
> > > xml-parser , and the process it
> > >
> > > Solution 2.
> > >
> > > write a wrapper wich changes the tidy-dom to an pure
> > > org.w3c.dom.Document
> > > and then process it
> > >
> > > Solution 3 :
> > > Search for another tool doing it
> > >
> > >
> > > Hmm can anyone of u , especially the developers of this too /
> > > libraryl,
> > > tell me what to do?
> > >
> > >
Received on Friday, 15 June 2001 18:31:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT