W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2003

html (url)-> xml -> xsl or xpath -> xml (DOM or file)

From: Rajeev J <jrajeev@firstam.com>
Date: Fri, 18 Jul 2003 17:45:38 +0530
To: html-tidy@w3c.org
Message-ID: <007201c34d26$4e6d7c80$4297a8c0@blr.res.firstam.com>
There have been a lot of discussion on this in the mailing list, but i
could find the solution for my problem, where as i got the same issue
been raised by other user, but since i could not locate the answer , i
am posting it again,Sorry for the trouble.
Basically i am trying to convert HTML to XML and then read the XML
document through XPath
for which i am using Tidy and below is the code (Java script)which i use
for that

WshShell = new ActiveXObject("WScript.Shell");
 var xmlhttp = new ActiveXObject("MSXML2.ServerXMLHTTP");
xmlhttp.open("GET", "http://www.rediff.com", false ); // , strUserID,
strPassword );
    var tidyObj = new ActiveXObject( "TidyCOM.TidyObject" );
 tidyObj.Options.OutputXML = true;
 tidyObj.Options.QuoteNbsp = false;
 tidyObj.Options.QuoteAmpersand = true;
 tidyObj.Options.Clean = true;
 tidyObj.Options.DropFontTags = true;
 tidyObj.Options.CharEncoding = 2; // Latin 1 = ISO-8859-1
 // Causes parsing errors to be logged, (very useful during development)
 tidyObj.Options.ErrorFile = 'tidy.err'; 
 strReturn = tidyObj.TidyMemToMem( xmlhttp.responseText) ;
 WshShell.Popup( strReturn);
  var ndFormInputs = strReturn.selectNodes(
'//form//input|//form//select' );
  for( i = 0; i < ndFormInputs.length; i++ ) {
   Wshshell.Popup( ndFormInputs(i).nodeName ) 

but when i get the response in XMl i only see a meta tag added to normal
HTML code and i would be able to read
data from that 
Here is want I want to do:

html (url)-> xml -> xsl or xpath -> xml (DOM or file)
Please provide me links or support on this 
your help would highly appreciated.


(image/bmp attachment: r.bmp)

Received on Friday, 18 July 2003 08:18:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC