W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2008

Using TidyLib as an HTML parser

From: John Snelson <john.snelson@oracle.com>
Date: Tue, 22 Jan 2008 03:14:09 +0000
Message-ID: <47955F81.8060203@oracle.com>
To: html-tidy@w3.org


I'm trying to use TidyLib as an HTML parser, and would like to generate 
SAX events from the TidyDoc representation of the document. However, 
there doesn't seem to be a way to get the unescaped value of a text 
node, or the unserialized value of a comment or processing instruction. 
I have been using the tidyNodeGetText() method to get the value of these 
node types.

Is there a better way to do what I want? I would be quite happy to 
implement a new API method to do this if that's required - does anyone 
else think this would be useful?


John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net
Received on Tuesday, 22 January 2008 03:14:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:56 UTC