W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2001

Re: working with buffer in memory

From: Scott Redman <redman@tivo.com>
Date: Mon, 19 Feb 2001 09:25:08 -0800
Message-ID: <3A9156F4.E15930E8@tivo.com>
To: "Eliasson, Johan" <johan.eliasson@softronic.se>
CC: "'html-tidy@w3.org'" <html-tidy@w3.org>
Look at the work I did for TclTidy, it does some of
what you want (and it will take you two minutes to
rip out the Tcl specific code).

Tcl can load TclTidy (which contains all of the Tidy
code) and pass a buffer containing HTML to parse.
TclTidy returns another buffer containing the result.
I don't know about the "block of text" you want to
parse.

http://sourceforge.net/projects/tclxml

Look for "tidy" under the TclXML project on SourceForge.
You'll have to use CVS to get the sources, we haven't
had time to gen tarballs or binary releases.

-- Scott


"Eliasson, Johan" wrote:
> 
> Hi folks!
> 
> I'm new to this list, found tidy when searching for
> html parsing code. Excellent job, it seems to almost
> exactly what I need !
> 
> I understand that tidy is working with files, stdio I/O ?
> 
> Let me describe my project, and maybe you can give me
> a little advice on how to go about doing what I want.
> 
> I have a memory buffer filled with a html page, that
> I need to parse, so that I easily can locate a specific
> tag and extract everything inside it, from the start-tag
> to the corresponding end-tag, as a block of text (not a tree).
> 
> How easily can this be done, and do you have any helpful
> advice for me ?
> 
> Thanks !!
> 
> Regards,
>                Johan E.
Received on Monday, 19 February 2001 12:26:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT