- From: <touch@isi.edu>
- Date: Wed, 18 Sep 1996 11:17:05 -0700
- To: mcurts@mail.telis.org
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> Can you help me? I'm looking for a utility that will remove HTTP codes > from documents and convert them to plain ASCII. > -- > Mark Curts Well, in case anyone else is interested, I wrote a PERL script to remove HTML from text. (HTTP is the protocol, I presume you meant HTML): #!/local/new/bin/perl # J. Touch # USC/ISI # 8/96 # Removes HTML codes from text input while ($line = <>) { # "eatfront" is set when a code spans multiple lines # if already inside HTML code... if ($eatfront == 1) { # delete through the terminator ">", if found if ($line =~ s/^[^>]*>//o) { $eatfront = 0; } else { # otherwise delete all and keep looking on next line $line = ""; } } # eat everything inside "<>"'s $line =~ s/<[^>]*>//go; # if there is a < without a matching >, eat it and keep looking if ($line =~ s/<[^>]*$//o) { $eatfront = 1; } print $line; } ---------------------------------------------------------------------- Joe Touch - touch@isi.edu http://www.isi.edu/~touch/ ISI / Project Leader, ATOMIC-2, LSAM http://www.isi.edu/atomic2/ USC / Research Assistant Prof. http://www.isi.edu/lsam/
Received on Wednesday, 18 September 1996 11:23:35 UTC