- From: Gary L Peskin <garyp@firstech.com>
- Date: Tue, 11 Jul 2000 09:19:39 -0700
- To: Scott Dossett <sdossett@metaphoria.net>
- CC: HTMLTidy <html-tidy@w3.org>
Scott -- The JTidy DOM model is a somewhat thin veneer on the node structure created by Tidy. If you look at the META tag, it doesn't have a close tag (</META> or <META />), so Tidy parses the next tag (<SCRIPT>) as the child of the META tag. This node model seems to work for Tidy's purposes although it does seem misleading when trying to access it using JTidy's DOM model. As a workaround, I just write out the JTidy output as XHTML and then read it back in to a real DOM parser. Gary Scott Dossett wrote: > > The DOM model constructed by JTidy for HTML seems to be erroneous. When > JTidy parses HTML it produces a DOM for the HTML. When attempting to > traverse this model, I have found that it does not get the structure > correct. For example, given the following code from Netcenter: > > ---------------------------------------------------------------------------- > ---------------------------------------------------- > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> > <!-- saved from url=(0025)http://home.netscape.com/ --> > <HTML><HEAD><TITLE>Netcenter</TITLE> > <META content="text/html; charset=windows-1252" > http-equiv=Content-Type><!--REPLACE_START_QWEST5--><!--REPLACE_END_QWEST5--> > <SCRIPT language=JavaScript type=text/javascript><!-- > if (parseFloat(navigator.appVersion) < > 3){document.write('<FRAMESET>');location.href="http://home.netscape.com/comp > uting/download/upgrade_index.html";} > else if > ((parseFloat(navigator.appVersion)>=5)&&((navigator.appName=="Netscape")||(n > avigator.appName=="Mozilla"))){location.href="http://home.netscape.com/index > 1.html";}// --></SCRIPT> > ---------------------------------------------------------------------------- > ---------------------------------------------------- > > When getting the parent of the <SCRIPT ...> tag here (via the org.w3c.dom > package included with JTidy), JTidy returns: > > <META content="text/html; charset=windows-1252" http-equiv=Content-Type> > > as its parent, where the <HEAD> tag should be returned as its parent. This > appears to be an error in JTidy. Response/suggestions/reprimands are > welcome. > > Thanks, > Scott
Received on Tuesday, 11 July 2000 12:19:28 UTC