- From: Bert Van Kets <bert@vankets.com>
- Date: Sat, 25 May 2002 09:04:56 +0200
- To: html-tidy@w3.org
Hi all,
I am using JTidy to convert a block of html to xhtml in Apache Cocoon. I
am having two problems with this.
1. When the string to be parsed contains invalid escaped (" ' ) or
non-ascii (>127) characters they don't converted to their escaped html version.
Can I do a setting to tidy or do I have to build a Dictionary for this? I
suppose JTidy must have some correction built in for this since it must be
a very common mistake.
I'm using a browser based html editor that's very simple to use, but does
not convert the non-ascii characters correctly.
2. JTidy adds a html, head, title and body tag (I can remove them with
XSLT, but that's messy)
Does JTidy *always* create full (X)HTML pages?
Here's the code from my XSP page:
String strContent = request.getParameter("content");
ByteArrayInputStream in = new ByteArrayInputStream(
strContent.getBytes() );
String strOut = "";
org.w3c.dom.Document doc = null;
org.w3c.tidy.Configuration conf = new org.w3c.tidy.Configuration();
try {
Tidy tidy = new Tidy();
//create output as XML
tidy.setXmlOut(true);
//output should be XHTML conforming
tidy.setXHTML(true);
tidy.setBreakBeforeBR(false);
tidy.setRawOut(false);
tidy.setCharEncoding( conf.UTF8 );
//do not output 'non-breaking space' as entity.
tidy.setQuoteNbsp(true);
//output naked ampersand as &
tidy.setQuoteAmpersand(true);
//drop presentation tags
tidy.setLiteralAttribs(true);
//parse the stream to a DOM document
doc = tidy.parseDOM(in, null);
} catch (Exception e) {
}
It's possible that I am having too many settings but the code has grown as
I was trying to get the output right.
Any help is welcome.
Bert
Received on Saturday, 25 May 2002 03:10:00 UTC