Re: Conversion Program

> Can you help me?  I'm looking for a utility that will remove HTTP codes 
> from documents and convert them to plain ASCII.
> -- 
> Mark Curts

Well, in case anyone else is interested, I wrote a PERL script to
remove HTML from text. (HTTP is the protocol, I presume you meant
HTML):


#!/local/new/bin/perl

# J. Touch
# USC/ISI
# 8/96

# Removes HTML codes from text input

while ($line = <>) {
        # "eatfront" is set when a code spans multiple lines
        # if already inside HTML code...
        if ($eatfront == 1) {
                # delete through the terminator ">", if found
                if ($line =~ s/^[^>]*>//o) {
                        $eatfront = 0;
                } else {
                        # otherwise delete all and keep looking on next line
                        $line = "";
                }
        }
        # eat everything inside "<>"'s
        $line =~ s/<[^>]*>//go;
        # if there is a < without a matching >,  eat it and keep looking
        if ($line =~ s/<[^>]*$//o) {
                $eatfront = 1;
        }
        print $line;
}
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

Received on Wednesday, 18 September 1996 11:23:35 UTC