Perl script to list first level of references

Charles Cave (cmc@sydney.unidata.oz.au)
Tue, 30 May 95 18:08:04 +1000


Message-Id: <9505300901.AA10206@www10.w3.org>
Date: Tue, 30 May 95 18:08:04 +1000
From: cmc@sydney.unidata.oz.au (Charles Cave)
To: www-html@www10.w3.org
Subject: Perl script to list first level of references

          

I wanted to write a perl script to show all the HREF
references in a document, mainly to assist me with some
web design.

I have only tested it on my *.htm files, so it is far
from rigourously tested!

Also, I solved the problem of ignoring the case of
the search string...just add the letter i to the command.


#!/usr/local/bin/perl

foreach $file (<*.htm*>) {
  print "$file ";
  open(HTFILE, $file);
  $notitle = 1;
  while (<HTFILE>) {
    chop;
    if (/^\<TITLE\>/i) {
      s/<TITLE>//;
      s/<\/TITLE>//;
      print "($_)\n";
      $notitle = 0;
    }
    if (/HREF=.*>/i) {
       $line = $_;
       $line =~ s/href/HREF/;
       $where = index($line, "HREF");
       $part1 = substr($line, ($where+4), 99);
       $rapos = index($part1,">");
       $part2 = substr($part1, 1, ($rapos-1));
       print "  -->  $part2\n";
    }
  }
  if ($notitle == 1) { print "\n" };
  close(HTFILE);
}


Sample output (heavily edited)


creative.htm (<title>Creativity Home Page</title>)
  -->  "charter.htm"
  -->  "whatsnew.htm"
  -->  "cgroup1.htm"
  -->  "cgroup2.htm"
  -->  "cgroup3.htm"
  -->  "suncrfaq.htm"
  -->  "crquote.htm"
  -->  "craffirm.htm"
  -->  "famous.htm"
  -->  "crbrain.htm"
  -->  "cgroup8.htm"
  -->  "crcredit.htm"
  -->  "mailto:charles@jolt.mpx.com.au"

crebooks.htm (Books related to Creativity)
  -->  "mailto:charles@jolt.mpx.com.au"
  -->  "AAdams.htm"
  -->  B13159.htm

crsware.htm (Creativity Software)
  -->  #brainstorm
  -->  #ckm
  -->  #cm1
  -->  #cork
  -->  #gcope
  -->  #gsys
  -->  #ide
  -->  #ideplus
  -->  #ifisher

sylvan.htm (CREATIVITY</HEAD>)
  -->  "http://www.shore.net/~quantum/"
  -->  "#8"
  -->  "#15"
  -->  "#18"
  -->  "#7"
  -->  "#10"
  -->  "#9"
  -->  "#12"
  -->  "#14"
  -->  "#16"
  -->  "#20"
  -->  "#10"
  -->  "#6"
  -->  "#7"
  -->  "#2"
  -->  "#5"
  -->  "#15"
  -->  "#6"
  -->  "#3"
  -->  "#16"
  -->  "#17"
  -->  "mailto:psylvan@world.std.com"
whatsnew.htm (What's new on the Creativity Web Pages)
  -->  "crbrain.htm"
  -->  "crebooks.htm"
  -->  "creative.htm"




---------------------------------------------------------------------------
    Charles Cave             ~  .-_|\        Phone +61 2 416 6877
Customer Services Manager      /     \ ~     Fax   +61 2 416 2086
   Unidata Australasia         \.--._*<---   Level 2, 280 Pacific Hwy
cmc@sydney.unidata.oz.au      ~     v        Lindfield, NSW, 2070 AUSTRALIA
---------------------------------------------------------------------------