Using TidyLib as an HTML parser from John Snelson on 2008-01-22 (html-tidy@w3.org from January to March 2008)

From: John Snelson <john.snelson@oracle.com>
Date: Tue, 22 Jan 2008 03:14:09 +0000
To: html-tidy@w3.org
Message-ID: <47955F81.8060203@oracle.com>

Hi,

I'm trying to use TidyLib as an HTML parser, and would like to generate 
SAX events from the TidyDoc representation of the document. However, 
there doesn't seem to be a way to get the unescaped value of a text 
node, or the unserialized value of a comment or processing instruction. 
I have been using the tidyNodeGetText() method to get the value of these 
node types.

Is there a better way to do what I want? I would be quite happy to 
implement a new API method to do this if that's required - does anyone 
else think this would be useful?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Received on Tuesday, 22 January 2008 03:14:42 UTC