- From: dude <dude@fastmail.ca>
- Date: Wed, 5 Feb 2003 17:30:42 -0500 (EST)
- To: jamieeagan@agora-inc.com
- Cc: html-tidy@w3.org
- Message-Id: <3E419092.000073.09711@ns.interchange.ca>
I highly recomend a tool called "Search and Rreplace" by Funduc 
software.  You could easily write a script that would grab all the 
text between ">" and "<".  you could further customize it to suit 
your needs.  I have successfully used to to strip out all the Word XP 
proprietary tags and such, while leaving the text formatting like 
bold and italics.
here is a link to the company's site:
http://www.funduc.com/search_replace.htm
peace,
dude
> 
> 
> 
>> Is anyone aware of a utility to remove the content from a web
>> page. We are converting a large amount of content from an
>> existing web site to a CM system.  In the past my company has
>> always done this manually by copying the site content from a
>> rendered page and copying to a txt editor like Notepad (thereby
>> stripping all the HTML) and then copying into the CM editor.  We
>> have the ability to load the information into the app if the
>> content is loaded as text.  Is anyone aware of a tool that can
>> spider through a site and create multipletext files....
>> 
_________________________________________________________________
    http://fastmail.ca/ - Fast Secure Web Email for Canadians
Received on Wednesday, 5 February 2003 17:31:32 UTC