- From: Bert Van Kets <bert@vankets.com>
- Date: Sat, 25 May 2002 09:04:56 +0200
- To: html-tidy@w3.org
Hi all, I am using JTidy to convert a block of html to xhtml in Apache Cocoon. I am having two problems with this. 1. When the string to be parsed contains invalid escaped (" ' ) or non-ascii (>127) characters they don't converted to their escaped html version. Can I do a setting to tidy or do I have to build a Dictionary for this? I suppose JTidy must have some correction built in for this since it must be a very common mistake. I'm using a browser based html editor that's very simple to use, but does not convert the non-ascii characters correctly. 2. JTidy adds a html, head, title and body tag (I can remove them with XSLT, but that's messy) Does JTidy *always* create full (X)HTML pages? Here's the code from my XSP page: String strContent = request.getParameter("content"); ByteArrayInputStream in = new ByteArrayInputStream( strContent.getBytes() ); String strOut = ""; org.w3c.dom.Document doc = null; org.w3c.tidy.Configuration conf = new org.w3c.tidy.Configuration(); try { Tidy tidy = new Tidy(); //create output as XML tidy.setXmlOut(true); //output should be XHTML conforming tidy.setXHTML(true); tidy.setBreakBeforeBR(false); tidy.setRawOut(false); tidy.setCharEncoding( conf.UTF8 ); //do not output 'non-breaking space' as entity. tidy.setQuoteNbsp(true); //output naked ampersand as & tidy.setQuoteAmpersand(true); //drop presentation tags tidy.setLiteralAttribs(true); //parse the stream to a DOM document doc = tidy.parseDOM(in, null); } catch (Exception e) { } It's possible that I am having too many settings but the code has grown as I was trying to get the output right. Any help is welcome. Bert
Received on Saturday, 25 May 2002 03:10:00 UTC