Using htmltidy to parse: getting the "body" of a node

Hello Tidy people,

I am trying to use Tidy to do its magic on (possibly
broken) html files, for input to other layers of
processing in C.  I need to get access to the body of
stuff.

For example, in this block:

<p>This is some text.</p>

how do I get access to the "This is some text." part? 
I can get a stream of TidyNodes, which have
attributes, but what about the actual content?  I
assume that the entire sequence of <p>Text</p> counts
as a single TidyNode?

Thanks for any tips on this.


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

Received on Wednesday, 1 October 2003 18:35:10 UTC