- From: Carlos Ruiz-Capillas <carlos.ruizcapillas@newknow.com>
- Date: Thu, 13 Dec 2001 15:53:48 +0100
- To: "'html-tidy@w3.org'" <html-tidy@w3.org>
Hi, I am trying to identify text that belongs to a script node using 4.Aug.2000 JTidy version. Consider Parsing the below HTML.It dosen't work well. <html><head></head> <body bgcolor="#4B4B4B" onLoad="ieFlash();"> <div id="topLayerLeft"> <table valign="top" cellpadding="0" cellspacing="0" border="0" width="324"> <script> var str=tll_links("yes","<img src=\"/cnet_news/template/link_test.gif\" border=\"0\" alt=\"Test your connection speed\">","Steve Ballmer, CEO, Microsoft: Ballmer talks up XML, .Net","t031201_1930","0","324733","%2A","cnet_news","http://video.cnet.com:8 0/cgi-bin/visearch?user=","Analyst test drives Office XP (3/1/01)","t030101_1330","<img src=\"http://www.cnet.com/i/gl/vid-b.gif\" width=\"24\" height=\"18\" hspace=\"0\" align=\"top\" border=\"0\">","loasf","&value=default&which=1&old=yes&hdr=news_vid_hed.gif" ,"http://video.cnet.com:80/cnet_news/template/asxgen.cgi?", "cpcode=674&asset=http://cnetnews.download.akamai.com/674/","ccstart=2000&cc stop=302666","Microsoft gaining ground at trial (3/2/01)","t030201_0830","ccstart=2000&ccstop=475166");document.write(str); </script> </table> </div> </body> </html> The text Nodes of the DOM representation (calling to org.w3c.dom.Document parseDOM(InputStream in, OutputStream out) method) are: TEXT NODE: CNET News.com TEXT NODE: var str=tll_links("yes"," TEXT NODE: ","Steve Ballmer, CEO, Microsoft: Ballmer talks up XML, .Net","t031201_1930","0","324733","%2A","cnet_news","http://video.cnet.com:8 0/cgi-bin/visearch?user=","Analyst test drives Office XP (3/1/01)","t030101_1330"," TEXT NODE: ","loasf","&value=default&which=1&old=yes&hdr=news_vid_hed.gif","http://vide o.cnet.com:80/cnet_news/template/asxgen.cgi?", "cpcode=674&asset=http://cnetnews.download.akamai.com/674/","ccstart=2000&cc stop=302666","Microsoft gaining ground a t trial (3/2/01)","t030201_0830","ccstart=2000&ccstop=475166");document.write(str); and if you ask for the last three nodes parent name the value return is: DIV Why is not recognized the tag <SCRIPT>? Is there any way to identify the tag <SCRIPT> or a new version that fixes this case? Thanks, Charlie. _____________________________________ Carlos Ruiz-Capillas Zarranz Software Engineer Newknow Network S.A. mailto: capillas@newknow.com Direct Phone: 91 639 89 50 Main Phone: 91 639 90 00 Fax: 91 638 71 59 This message and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. No confidentiality or privilege is waived or lost by any wrong transmission. If you have received this message in error, please immediately destroy it and kindly notify the sender by reply email. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Opinions, conclusions and other information in this message that do not relate to the official business of Newknow shall be understood as neither given nor endorsed by it.
Received on Thursday, 13 December 2001 09:54:02 UTC