- From: iain truskett <koschei@eh.org>
- Date: Sat, 21 Aug 1999 14:11:44 +1000
- To: html-tidy@w3.org
bpaddock@csonline.net (Bob Paddock) wrote on 20 Aug: [...] > hcr.zip - Reduce HTML file by removing comments. w/C src [...] Took a brief look at hcr, munged it so it should compile under any OS rather than requiring odd Borlandisms, and discarded it in favour of the following perl script which makes a slightly more condensed version. (If anyone wants the modified 'hcr' source, just ask) If one runs tidy over the results and also over the source, using the same options, one obtains almost the same results (minor spacing changes exist). ----------------------------------------------------------------------- #!/usr/bin/perl -w use strict; foreach my $file (@ARGV) { open(INPUT, $file) or die "Cannot open $file: $!"; chomp(my @contents = <INPUT>); close(INPUT); my $glob; while (@contents) { my $line = shift @contents; $line =~ s/^\s*(.*)\s*$/$1/g; $glob .= $line." " unless ($line =~ m/^\s*$/); } $glob =~ s/<!-- .*? -->//g; $glob =~ s/>\s+\</>\</g; $glob =~ s-\s+(</?H\d( .*?)?>)\s+-$1-g; $glob =~ s-\s+(</?P( .*?)?>)\s+-$1-g; $glob =~ s/\s+(<UL)/$1/g; $glob =~ s/\s+(<HR.*?>)\s+/$1/g; $glob =~ s/\s+(<BR.*?>)\s+/$1/g; open(OUTPUT, ">".$file."~") or die "Cannot open $file: $!"; print OUTPUT $glob; close(OUTPUT); } ----------------------------------------------------------------------- As is usual for perl scripts, it's not the only way to do it, nor is it probably the best. Basic call is "oh file1.html file2.html file3.html" or whatever, and it will output file1.html~ file2.html~ etc. Easily modifiable to take stdin and output stdout if so desired. If you want to call the script something, probably "oh" would be appropriate (see "oc"). cheers, -- iain, aka koschei <http://eh.org/~koschei/> Famous last RPG words, number 791 - "Sure I'd like to kiss her."
Received on Saturday, 21 August 1999 00:12:08 UTC