Using TidyLib as an HTML parser

Hi,

I'm trying to use TidyLib as an HTML parser, and would like to generate 
SAX events from the TidyDoc representation of the document. However, 
there doesn't seem to be a way to get the unescaped value of a text 
node, or the unserialized value of a comment or processing instruction. 
I have been using the tidyNodeGetText() method to get the value of these 
node types.

Is there a better way to do what I want? I would be quite happy to 
implement a new API method to do this if that's required - does anyone 
else think this would be useful?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Received on Tuesday, 22 January 2008 03:14:42 UTC