I highly recomend a tool called "Search and Rreplace" by Funduc software. You could easily write a script that would grab all the text between ">" and "<". you could further customize it to suit your needs. I have successfully used to to strip out all the Word XP proprietary tags and such, while leaving the text formatting like bold and italics. here is a link to the company's site: http://www.funduc.com/search_replace.htm peace, dude > > > >> Is anyone aware of a utility to remove the content from a web >> page. We are converting a large amount of content from an >> existing web site to a CM system. In the past my company has >> always done this manually by copying the site content from a >> rendered page and copying to a txt editor like Notepad (thereby >> stripping all the HTML) and then copying into the CM editor. We >> have the ability to load the information into the app if the >> content is loaded as text. Is anyone aware of a tool that can >> spider through a site and create multipletext files.... >> _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for CanadiansReceived on Wednesday, 5 February 2003 17:31:32 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 13:57:15 GMT