W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Re: Need to Strip all HTML tages from a renderd web Page

From: dude <dude@fastmail.ca>
Date: Wed, 5 Feb 2003 17:30:42 -0500 (EST)
Message-Id: <3E419092.000073.09711@ns.interchange.ca>
To: jamieeagan@agora-inc.com
Cc: html-tidy@w3.org
I highly recomend a tool called "Search and Rreplace" by Funduc 
software.  You could easily write a script that would grab all the 
text between ">" and "<".  you could further customize it to suit 
your needs.  I have successfully used to to strip out all the Word XP 
proprietary tags and such, while leaving the text formatting like 
bold and italics.

here is a link to the company's site:


>> Is anyone aware of a utility to remove the content from a web
>> page. We are converting a large amount of content from an
>> existing web site to a CM system.  In the past my company has
>> always done this manually by copying the site content from a
>> rendered page and copying to a txt editor like Notepad (thereby
>> stripping all the HTML) and then copying into the CM editor.  We
>> have the ability to load the information into the app if the
>> content is loaded as text.  Is anyone aware of a tool that can
>> spider through a site and create multipletext files....

    http://fastmail.ca/ - Fast Secure Web Email for Canadians
Received on Wednesday, 5 February 2003 17:31:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:53 UTC