W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

New Site Cleaning Project

From: Julian Voelcker <asp@tvw.net>
Date: Tue, 18 Mar 2003 16:32:07 GMT
To: html-tidy@w3.org
Message-Id: <VA.000006cf.219a7f13@tvw.net>

I quite often have to tidy up websites or convert documents to be added 
to a website or a content management system.

To date I have done a lot of it by hand, but am now contemplating 
developing it into a small application that I may consider 

I want the app to be able to work on a set of directories and do the 

- If the docs are in non html format, convert them to html.
- Use Tidy to do an initial pre-processing tidy up of the documents so 
that the html is in a clean, non tabbed, formatted structure.
- Use a regular expression find and replace facility to strip out 
unwanted code - quite a bit of hand work will be required here, but the 
idea would be to store each search so that the user can build up a 
library of searches during testing before applying them to a complete 
- Use Tidy to do a final post processing re-format of the finished 
- If the documents are to entered into a database, extract the relevant 
data and output in a format suitable for importing into a database.

I'm going to be writing the app in C#.

I would appreciate any comments/feedback from the users here as to 
whether this is a good idea (has it been done before) and whether you 
have any advice for me.

It's been a while since I have written a Windows app so I may be a 
little slow in getting started with it.


Julian Voelcker
The Virtual World (UK) Limited
Cirencester, United Kingdom
Received on Tuesday, 18 March 2003 11:32:11 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:53 UTC