Using htmltidy to parse: getting the "body" of a node from joe user on 2003-10-01 (html-tidy@w3.org from October to December 2003)

From: joe user <palehaole@yahoo.com>
Date: Wed, 1 Oct 2003 12:45:18 -0700 (PDT)
To: html-tidy@w3.org
Message-ID: <20031001194518.99919.qmail@web20414.mail.yahoo.com>

Hello Tidy people,

I am trying to use Tidy to do its magic on (possibly
broken) html files, for input to other layers of
processing in C.  I need to get access to the body of
stuff.

For example, in this block:

<p>This is some text.</p>

how do I get access to the "This is some text." part? 
I can get a stream of TidyNodes, which have
attributes, but what about the actual content?  I
assume that the entire sequence of <p>Text</p> counts
as a single TidyNode?

Thanks for any tips on this.


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

Received on Wednesday, 1 October 2003 18:35:10 UTC