- From: Niels Peter Strandberg <nielspeter@npstrandberg.com>
- Date: Fri, 26 Jan 2001 15:40:33 +0100
- To: html-tidy@w3c.org
Hi! (Using jTidy) I'm converting a html file to xml. I have 2 problems that I need to know how to solve. Code: tidy.setXmlOut(true); tidy.setFixBackslash(true); // URL FixBackslash tidy.setRawOut(true); // RawOut - avoid mapping values > 127 to entities tidy.setXmlPi(true); // XmlPi - add <?xml?> for XML docs tidy.setQuoteAmpersand(true); // QuoteAmpersand - output naked ampersand as & tidy.setTidyMark(false); // TidyMark - add meta element indicating tidied doc tidy.setWraplen(99999); // Wraplen - default wrap margin The result file output: <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <html> <head> <link rel="made" href="wsanchez@apple.com" /> <title>Welcome to Mac OS X!</title> ........... Problems: I want to treat this result file as a "normal" XML file. I'm going to transform the result using XSL and XPath. 1) Entities! The © is treated as an entity. So the parser complains. What I want is all "entities" converted to their "right" character. (ex. © -> ©). How can this be done? 2) I open the result file in XML Spy for Window. XML Spy tells me that the <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> need a space some where. Do I need the DOCTYPE at all? How do I solve the problem? Here is want I want to do: html (url)-> xml -> xsl or xpath -> xml (DOM or file) the ideal was: html -> DOM (jTidy), then using XPath or XSL to manipulate the DOM tree -> Result could be a XML file, HTML file, DOM tree .... Is there anyone out there that has made an application that can do this in one go, and are ready to share it? Regards, Niels Peter
Received on Friday, 26 January 2001 09:40:07 UTC