New Site Cleaning Project

I quite often have to tidy up websites or convert documents to be added 
to a website or a content management system.

To date I have done a lot of it by hand, but am now contemplating 
developing it into a small application that I may consider 
distributing.

I want the app to be able to work on a set of directories and do the 
following:-

- If the docs are in non html format, convert them to html.
- Use Tidy to do an initial pre-processing tidy up of the documents so 
that the html is in a clean, non tabbed, formatted structure.
- Use a regular expression find and replace facility to strip out 
unwanted code - quite a bit of hand work will be required here, but the 
idea would be to store each search so that the user can build up a 
library of searches during testing before applying them to a complete 
site.
- Use Tidy to do a final post processing re-format of the finished 
output.
- If the documents are to entered into a database, extract the relevant 
data and output in a format suitable for importing into a database.

I'm going to be writing the app in C#.


I would appreciate any comments/feedback from the users here as to 
whether this is a good idea (has it been done before) and whether you 
have any advice for me.

It's been a while since I have written a Windows app so I may be a 
little slow in getting started with it.

Cheers,

Julian Voelcker
The Virtual World (UK) Limited
Cirencester, United Kingdom

Received on Tuesday, 18 March 2003 11:32:11 UTC