Re: working with buffer in memory from Scott Redman on 2001-02-19 (html-tidy@w3.org from January to March 2001)

From: Scott Redman <redman@tivo.com>
Date: Mon, 19 Feb 2001 09:25:08 -0800
To: "Eliasson, Johan" <johan.eliasson@softronic.se>
CC: "'html-tidy@w3.org'" <html-tidy@w3.org>
Message-ID: <3A9156F4.E15930E8@tivo.com>

Look at the work I did for TclTidy, it does some of
what you want (and it will take you two minutes to
rip out the Tcl specific code).

Tcl can load TclTidy (which contains all of the Tidy
code) and pass a buffer containing HTML to parse.
TclTidy returns another buffer containing the result.
I don't know about the "block of text" you want to
parse.

http://sourceforge.net/projects/tclxml

Look for "tidy" under the TclXML project on SourceForge.
You'll have to use CVS to get the sources, we haven't
had time to gen tarballs or binary releases.

-- Scott

"Eliasson, Johan" wrote:
> 
> Hi folks!
> 
> I'm new to this list, found tidy when searching for
> html parsing code. Excellent job, it seems to almost
> exactly what I need !
> 
> I understand that tidy is working with files, stdio I/O ?
> 
> Let me describe my project, and maybe you can give me
> a little advice on how to go about doing what I want.
> 
> I have a memory buffer filled with a html page, that
> I need to parse, so that I easily can locate a specific
> tag and extract everything inside it, from the start-tag
> to the corresponding end-tag, as a block of text (not a tree).
> 
> How easily can this be done, and do you have any helpful
> advice for me ?
> 
> Thanks !!
> 
> Regards,
>                Johan E.

Received on Monday, 19 February 2001 12:26:08 UTC