- From: dude <dude@fastmail.ca>
- Date: Wed, 5 Feb 2003 17:30:42 -0500 (EST)
- To: jamieeagan@agora-inc.com
- Cc: html-tidy@w3.org
- Message-Id: <3E419092.000073.09711@ns.interchange.ca>
I highly recomend a tool called "Search and Rreplace" by Funduc software. You could easily write a script that would grab all the text between ">" and "<". you could further customize it to suit your needs. I have successfully used to to strip out all the Word XP proprietary tags and such, while leaving the text formatting like bold and italics. here is a link to the company's site: http://www.funduc.com/search_replace.htm peace, dude > > > >> Is anyone aware of a utility to remove the content from a web >> page. We are converting a large amount of content from an >> existing web site to a CM system. In the past my company has >> always done this manually by copying the site content from a >> rendered page and copying to a txt editor like Notepad (thereby >> stripping all the HTML) and then copying into the CM editor. We >> have the ability to load the information into the app if the >> content is loaded as text. Is anyone aware of a tool that can >> spider through a site and create multipletext files.... >> _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians
Received on Wednesday, 5 February 2003 17:31:32 UTC