Re: JTidy and Error in Dom Construction

Scott --

The JTidy DOM model is a somewhat thin veneer on the node structure
created by Tidy.  If you look at the META tag, it doesn't have a close
tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the
child of the META tag.

This node model seems to work for Tidy's purposes although it does seem
misleading when trying to access it using JTidy's DOM model.  As a
workaround, I just write out the JTidy output as XHTML and then read it
back in to a real DOM parser.

Gary

Scott Dossett wrote:
> 
> The DOM model constructed by JTidy for HTML seems to be erroneous.  When
> JTidy parses HTML it produces a DOM for the HTML.  When attempting to
> traverse this model, I have found that it does not get the structure
> correct.  For example, given the following code from Netcenter:
> 
> ----------------------------------------------------------------------------
> ----------------------------------------------------
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <!-- saved from url=(0025)http://home.netscape.com/ -->
> <HTML><HEAD><TITLE>Netcenter</TITLE>
> <META content="text/html; charset=windows-1252"
> http-equiv=Content-Type><!--REPLACE_START_QWEST5--><!--REPLACE_END_QWEST5-->
> <SCRIPT language=JavaScript type=text/javascript><!--
> if (parseFloat(navigator.appVersion) <
> 3){document.write('<FRAMESET>');location.href="http://home.netscape.com/comp
> uting/download/upgrade_index.html";}
> else if
> ((parseFloat(navigator.appVersion)>=5)&&((navigator.appName=="Netscape")||(n
> avigator.appName=="Mozilla"))){location.href="http://home.netscape.com/index
> 1.html";}// --></SCRIPT>
> ----------------------------------------------------------------------------
> ----------------------------------------------------
> 
> When getting the parent of the <SCRIPT ...> tag here (via the org.w3c.dom
> package included with JTidy), JTidy returns:
> 
> <META content="text/html; charset=windows-1252" http-equiv=Content-Type>
> 
> as its parent, where the <HEAD> tag should be returned as its parent.  This
> appears to be an error in JTidy.  Response/suggestions/reprimands are
> welcome.
> 
> Thanks,
> Scott

Received on Tuesday, 11 July 2000 12:19:28 UTC