W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2003

Using htmltidy to parse: getting the "body" of a node

From: joe user <palehaole@yahoo.com>
Date: Wed, 1 Oct 2003 12:45:18 -0700 (PDT)
Message-ID: <20031001194518.99919.qmail@web20414.mail.yahoo.com>
To: html-tidy@w3.org

Hello Tidy people,

I am trying to use Tidy to do its magic on (possibly
broken) html files, for input to other layers of
processing in C.  I need to get access to the body of
stuff.

For example, in this block:

<p>This is some text.</p>

how do I get access to the "This is some text." part? 
I can get a stream of TidyNodes, which have
attributes, but what about the actual content?  I
assume that the entire sequence of <p>Text</p> counts
as a single TidyNode?

Thanks for any tips on this.


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
Received on Wednesday, 1 October 2003 18:35:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC