Re: HTML Strippers

Joe English (jenglish@crl.com)
Wed, 26 Apr 1995 10:53:56 -0700


Message-Id: <199504261755.AA22157@mail.crl.com>
To: Multiple recipients of list <www-html@www10.w3.org>
Subject: Re: HTML Strippers 
In-Reply-To: <199504260458.VAA14137@shell1.best.com> 
Date: Wed, 26 Apr 1995 10:53:56 -0700
From: Joe English <jenglish@crl.com>


rmesa@best.com (Robert A. Mesa) wrote:

> Is there a utility to strip away HTML tags. Yes I know, WHY? I've been task
> to do such a thing at work. Any info would be greatly appreciated.

sgmls and sgmlsasp with an empty replacement file 
will do the trick:

	sgmls html.decl YourFile.html | sgmlsasp /dev/null > YourFile.txt

This assumes that YourFile.html is valid HTML, of course...

The output will be the text portions of YourFile.html,
with references expanded and all other markup removed. 

If you're on a DOS system, substitute any empty file for /dev/null;
I don't know about other systems.


--Joe English

  jenglish@crl.com