W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

Re: Is there any way to get TIDY to recursively tidy up a tree of fil es?

From: Sebastian Lange <lange@cyperfection.de>
Date: Thu, 13 Jul 2000 08:50:02 +0200
Message-Id: <>
To: html-tidy@w3.org
At 20:14 12.07.2000 -0700, RickR@biztro.com wrote:

... well, he wrote nothing... ;-) but his question was: Is there any way to 
get TIDY to recursively tidy up a tree of files?

I am currently working on a perl script to convert certain HTML files into 
XML files, it basically is called like "./html2xml.pl 
source_path/to/html_files target_path/to/xml_files". The path to the HTML 
sources is processed recursively, the XML files are written to the target 
directory (without recursivle re-creating the directory structure, but that 
should be a minor modification only).

If I find some time next week, I'll have a go at it... for instance, you 
can take this piece of perl code and modify it to your needs....

push (@DirStack, $ARGV[0]);
$XMLDir = $ARGV[1];

while ($DirCursor = pop @DirStack) {
         opendir TheDir, $DirCursor;
                 @DirContent = readdir TheDir;
                 foreach $Filename (@DirContent) {
                         if (($Filename eq '.') || ($Filename eq '..')) {
                         $FilePath = $DirCursor.'/'.$Filename;
                         if (-d $FilePath) {
                                 push @DirStack, $FilePath;
                         if ($FilePath =~ /.html?$/i) {
                                 open (INFILE, "< $FilePath") or 
die("ERROR: \'$FilePath\' not openable for reading!\n");
                                         $FILE = join("", <INFILE>);
                                 close (INFILE);

                                 $FILE = trim(retrieveVariables($FILE));

                                 print "Parsed file: " . $FilePath . " -> " 
. $XMLFilename . "\n";
         closedir TheDir;

>Rick Roth
>Biztro, Inc.

Sebastian Lange
Maybe the first chat site that validates as HTML
4.0 even though user input may contain HTML codes.

Courtesy to Dave Raggett's HTML Tidy:
Received on Thursday, 13 July 2000 02:53:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:48 UTC