W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2008

Using TidyLib as an HTML parser

From: John Snelson <john.snelson@oracle.com>
Date: Tue, 22 Jan 2008 03:14:09 +0000
Message-ID: <47955F81.8060203@oracle.com>
To: html-tidy@w3.org

Hi,

I'm trying to use TidyLib as an HTML parser, and would like to generate 
SAX events from the TidyDoc representation of the document. However, 
there doesn't seem to be a way to get the unescaped value of a text 
node, or the unserialized value of a comment or processing instruction. 
I have been using the tidyNodeGetText() method to get the value of these 
node types.

Is there a better way to do what I want? I would be quite happy to 
implement a new API method to do this if that's required - does anyone 
else think this would be useful?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net
Received on Tuesday, 22 January 2008 03:14:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:58 GMT