W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

Re: Compressing Code

From: iain truskett <koschei@eh.org>
Date: Sat, 21 Aug 1999 14:11:44 +1000
To: html-tidy@w3.org
Message-ID: <15a6e03449.ict@brucehall79.anu.edu.au>
bpaddock@csonline.net (Bob Paddock) wrote on 20 Aug:

>         hcr.zip - Reduce HTML file by removing comments. w/C src

Took a brief look at hcr, munged it so it should compile under any OS
rather than requiring odd Borlandisms, and discarded it in favour of
the following perl script which makes a slightly more condensed

(If anyone wants the modified 'hcr' source, just ask)

If one runs tidy over the results and also over the source, using the
same options, one obtains almost the same results (minor spacing
changes exist).

#!/usr/bin/perl -w
use strict;

foreach my $file (@ARGV) {
    open(INPUT, $file) or die "Cannot open $file: $!";
    chomp(my @contents = <INPUT>);

    my $glob;
    while (@contents) {
        my $line = shift @contents;
        $line =~ s/^\s*(.*)\s*$/$1/g;
        $glob .= $line." " unless ($line =~ m/^\s*$/);
    $glob =~ s/<!-- .*? -->//g;
    $glob =~ s/>\s+\</>\</g;
    $glob =~ s-\s+(</?H\d( .*?)?>)\s+-$1-g;
    $glob =~ s-\s+(</?P( .*?)?>)\s+-$1-g;
    $glob =~ s/\s+(<UL)/$1/g;
    $glob =~ s/\s+(<HR.*?>)\s+/$1/g;
    $glob =~ s/\s+(<BR.*?>)\s+/$1/g;

    open(OUTPUT, ">".$file."~") or die "Cannot open $file: $!";
    print OUTPUT $glob;

As is usual for perl scripts, it's not the only way to do it, nor is it
probably the best. Basic call is "oh file1.html file2.html file3.html"
or whatever, and it will output file1.html~ file2.html~ etc.

Easily modifiable to take stdin and output stdout if so desired.

If you want to call the script something, probably "oh" would be
appropriate (see "oc").

iain, aka koschei                             <http://eh.org/~koschei/>
Famous last RPG words, number 791 -        "Sure I'd like to kiss her."
Received on Saturday, 21 August 1999 00:12:08 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:46 UTC