W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

Re: JTidy and Error in Dom Construction

From: Gary L Peskin <garyp@firstech.com>
Date: Tue, 11 Jul 2000 09:19:39 -0700
Message-ID: <396B491B.2D0B428A@firstech.com>
To: Scott Dossett <sdossett@metaphoria.net>
CC: HTMLTidy <html-tidy@w3.org>
Scott --

The JTidy DOM model is a somewhat thin veneer on the node structure
created by Tidy.  If you look at the META tag, it doesn't have a close
tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the
child of the META tag.

This node model seems to work for Tidy's purposes although it does seem
misleading when trying to access it using JTidy's DOM model.  As a
workaround, I just write out the JTidy output as XHTML and then read it
back in to a real DOM parser.

Gary

Scott Dossett wrote:
> 
> The DOM model constructed by JTidy for HTML seems to be erroneous.  When
> JTidy parses HTML it produces a DOM for the HTML.  When attempting to
> traverse this model, I have found that it does not get the structure
> correct.  For example, given the following code from Netcenter:
> 
> ----------------------------------------------------------------------------
> ----------------------------------------------------
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <!-- saved from url=(0025)http://home.netscape.com/ -->
> <HTML><HEAD><TITLE>Netcenter</TITLE>
> <META content="text/html; charset=windows-1252"
> http-equiv=Content-Type><!--REPLACE_START_QWEST5--><!--REPLACE_END_QWEST5-->
> <SCRIPT language=JavaScript type=text/javascript><!--
> if (parseFloat(navigator.appVersion) <
> 3){document.write('<FRAMESET>');location.href="http://home.netscape.com/comp
> uting/download/upgrade_index.html";}
> else if
> ((parseFloat(navigator.appVersion)>=5)&&((navigator.appName=="Netscape")||(n
> avigator.appName=="Mozilla"))){location.href="http://home.netscape.com/index
> 1.html";}// --></SCRIPT>
> ----------------------------------------------------------------------------
> ----------------------------------------------------
> 
> When getting the parent of the <SCRIPT ...> tag here (via the org.w3c.dom
> package included with JTidy), JTidy returns:
> 
> <META content="text/html; charset=windows-1252" http-equiv=Content-Type>
> 
> as its parent, where the <HEAD> tag should be returned as its parent.  This
> appears to be an error in JTidy.  Response/suggestions/reprimands are
> welcome.
> 
> Thanks,
> Scott
Received on Tuesday, 11 July 2000 12:19:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT