- From: Rijk van Geijtenbeek <rijk@opera.com>
- Date: Thu, 06 Feb 2003 11:02:50 +0100
- To: HTML-tidy list <html-tidy@w3.org>
On Wed, 5 Feb 2003 15:43:01 -0500, Jamie Eagan <jamieeagan@agora-inc.com> wrote: >> Is anyone aware of a utility to remove the content from a web page. We >> are >> converting a large amount of content from an existing web site to a CM >> system. In the past my company has always done this manually by copying >> the site content from a rendered page and copying to a txt editor like >> Notepad (thereby stripping all the HTML) and then copying into the CM >> editor. We have the ability to load the information into the app if the >> content is loaded as text. Is anyone aware of a tool that can spider >> through a site and create multipletext files.... If you install Lynx, you can easily run the page through Lynx and let it output a nicely formatted text file - but without support for tables. My favorite text editor NoteTab also has a good 'convert to text' function, and can be scripted to run through a complete directory of HTML files on your hard disk. Tidy however can not do this, so it is rather off-topic for this list. -- If you don't like having choices | Rijk van Geijtenbeek made for you, you should start | Documentation & QA making your own. - Neal Stephenson | mailto:rijk@opera.com
Received on Thursday, 6 February 2003 05:04:17 UTC