Crawling Through w3c Validator

Dear Validator List.
 
I am currently working on a project which intent to crawl a huge number of
domains (all .dk domains) and do a test check on some random pages within
that domain to see if they are using valid HTML. The result should be a
searchable database indikating which domains are using valid code and some
other info.
 
The database is going to be a foundation for at danish website I'm
constructing about using valid HTML code and the advantages of it. It' will
contain links, information and articles about valid HTML coding a.s.o. The
database will provide statistic information about the current state of
danish websites. My hope is that it will be possible to do more than one
crawl of the sites during time, however at the moment I'm only trying to get
the first crawl.
 
I realise that sending this amount of pages through the online w3c validator
using the crawler I have build maybe will have influence on the online
service. I have tried to install the validator locally on a win2k server,
however not being a perl guy it gives me a rather large amount of trouble.
I'm normally doing "light" asp.net programming and a little .Net Windows
programming for the crawler.
 
My question is now, will it be possible (legal?) for me to go through the
online validator? If not, does anyone have a suggestion for me how I can do
a local install on a win2k server not being a great perl wiz, or maybe you
know some way to actually integrate the validator into my windows app using
vb.net or something similar?
 
I have tried the TidyCOM which works marvelous well in my app, however the
program seems more designed for changing inputcode, than just plain and
simple validating it, so I discarded it after a couple of try's because it
didn't seem to warn about certain errors, but just fix them. Does there
exist something like a W3CValidatorCOM or similar?
 
I hope some of you are able to help me, in the interest of educating people
about w3c standards, and maybe can point me in a direction for solving my
problem.
 
I have not joined this list wherefore I do not know if I'll get responses to
this mail if you send it to the list, so please CC me at Oscar@Gensmann.Com
<mailto:Oscar@Gensmann.Com>  :-)
 
Yours Faithfully
Oscar Eg Gensmann
-- 
Oscar@Gensmann.com <mailto:Oscar@Gensmann.com> 

Received on Tuesday, 9 April 2002 14:36:55 UTC