Re: HTML Strippers

John Labovitz (
Tue, 25 Apr 1995 21:40:36 -0700

Message-Id: <>
From: John Labovitz <>
Cc: Multiple recipients of list <>
Subject: Re: HTML Strippers 
In-Reply-To: Your message of "Wed, 26 Apr 1995 01:06:25 +0500."
Date: Tue, 25 Apr 1995 21:40:36 -0700 (Robert A. Mesa) said:

> Is there a utility to strip away HTML tags. 

if you can't find anything else, the following
perl script (which i call 'unhtml') will work ok:


  $* = 1;		# turn on multi-line string matching
  undef($/);		# turn off paragraph-mode reading
  $_ = <>;		# read in entire file
  s/<[^>]+>//g;		# remove <...>'s in the entire string
  print;		# print the file

this would be run like:

  unhtml file.html >file.txt

it's not by any means perfect -- angle brackets
within quoted strings will be munged, and nothing
is done with entities (like &amp;).

another option, especially if you want the html
code to be formatted, is to use the lynx browser
in 'dump' mode:

  % lynx -dump file.html >file.txt

hope this helps.

John Labovitz
Technical Services Manager, Global Network Navigator <>
O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)